Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

CHAPTER 8: SYSTEMATIC SAMPLING

8.1 Definition of Systematic Sample and Selection Procedures

Systematic sampling is a more commonly used selection procedure. Systematic sampling means
the selection of units at a fixed interval from a list, starting from a randomly determined point. In
other words, it involves selecting units from a list using a selection interval, say k, so that every
k th element on the list, following a random start selected between 1 and k, is included in the
sample. Systematic sampling is normally carried out as follows considering different cases.

8.1.1 When the interval (k) is an integer or assumes a whole number:

If a sample of size n units is to be selected systematically from a list of population consisting of


N units, where N is a multiple of n, then k  N is a sampling interval from which a random
n
number between 1 and k is chosen. Let this random number be j and we then select the units
numbered as j , j  k , j  2k , j  3k ,      , j  n  1 k , until we arrive at the required
sample size.

For example, consider the farm survey in which n = 8,000 farms are to be selected using
144,000
systematic sampling from N = 144,000 farms. The selection interval is k  N n   18 .
8,000
We must find a random start number between 1 and 18 inclusive, and then take every 18 th
thereafter. If the random start number between 1 and 18 is 12, then the units in the sample would
correspond to the farms numbered 12, 30, 48, 66, 84,---, 143994. Once the sampling interval (k)
is determined, the random selection of the starting point determines the whole sample. In this
case there are 18 possible samples that can be chosen, i.e., j =1, 2,- - -,18.
1 1 n 1
For this design the sampling fraction is f    
k N N 18
n
8.1.2 When the interval k  N is not an integer
n
When k is taken as an integer nearest to the whole number, there is a method of taking a
systematic sample of 1 in k elements that will always lead to unbiased estimates of population
means, totals, and proportions. This method requires prior knowledge of the population size N.
The procedure of this method is as follows:
1. Choose a random number between 1 and N where N is the number of elements in the
population. Let this number be j, where j = 1, 2, - - - , N.
2. Compute the quotient j , where k is the sampling interval nearest to the whole number.
k
Express this quotient as an integer plus a remainder.

3. Express this quotient as an integer plus a remainder.


 If the remainder is 0, take a systematic sample of 1 in k elements in the usual way
beginning with element k.
 If the remainder is nonzero, say m, take a systematic sample of 1 in k elements,
beginning with element m.

1
Example: Suppose we wish to take a 1-in-6 sample from a list of 26 employees. Since k is
26 2
determined (k = 6), the required sample size is n = N  4 4
k 6 6
We need to select 4 employees using systematic sampling with interval k = 6.
Instead of taking a random number between 1 and 6 to start the systematic sampling, we must
take a random number j between 1 and 26, and j will have 26 possible values.
2
Let j be 8. We then divide 8 by 6 and determine the remainder, i.e., j = 8  1 , the
k 6 6
nd
remainder is 2 (m = 2). We take a systematic sample 1 in 6, starting with the 2 employee, and
obtain 2, 8, 14, 20 on the list.
When j = 24, the quotient j = 24  4, and the remainder m = 0. In this case we take a
k 6
systematic sample 1 in 6, starting with k = 6. Then we obtain 6, 12, 18, 24 on the list.

8.1.3 Circular systematic sampling

This method can be used where N is a multiple of n or not. It consists of choosing a random start
between 1 and N and thereafter, every k th unit in a cyclical manner will be selected, until a
sample of n units is obtained. The interval k is computed as an integer or to the nearest integer.
Generally, if j is a random start, 1  j  N , the following serial numbers will be selected into the
sample.

j  ik , if j  ik  N , and j  ik  N , if j  ik  N , for i= 0, 1, 2, - - -, n-1.

Example:- for illustration purpose let the population size N=14, and the sample size n=5. The
N 14 4
value of the interval is k   2 3
n 5 5
We choose a random start j between 1 and 14. Let j =7. Then a systematic sample 1 in 3 would
produce samples with the following serial numbers.
For i = 0, 1, 2, 3, 4 and j =7, we obtain the following numbers either by using j+ik or j+ik-N.
7  0  3  7 , if i = 0
7  1  3  10 , if i = 1
7  2  3  13 , if i = 2
7  3  3  14  2 , if i = 3
7  4  3  14  5 , if i = 4
These numbers are located on the circle to show cyclical arrangement.

2
8.2 The Estimated Mean and Its Variance

In examining the systematic samples, it is operationally equivalent to grouping the N elements


into k clusters, each containing n elements that are k units apart on the list, and then taking a
random sample of one of these clusters.

If the units are listed randomly, this is similar to simple random sampling. But, if the units are
listed in order by a specific attribute such as age group, income group etc, then this is similar to
stratified sampling. This process will reduce the sampling error associated with a randomly
ordered list by ensuring that each group is represented in the sample in a proportion equal to its
proportion in the underlying population.

Several formulas have been developed for the variance of y sy , the mean of a systematic sample.
However, a systematic sampling can be treated just like a simple random sample for the purpose
of analysis provided that certain conditions are met.
The basic requirement of this is that the list used as the sampling frame must not have any
intrinsic regularity or periodicity of its own.
Consider all possible samples of 1-in- k elements.

Random Number Chosen


sample units 1 2    j    k
1 1 2    j    k
2 1 k 2k jk 2k
3 1  2k 2  2k j  2k 3k
' ' ' ' '
' ' ' ' '
' ' ' ' '
r 1  (r  1)k 2  (r  1)k    j  (r  1)k    rk
' ' ' ' '
' ' ' ' '
' ' ' ' '
n 1  (n  1)k 2  (n  1)k    j  (n  1)k    nk

Corresponding to this points we find the value Y jr where Yjr represents the rth element of the jth
systematic sample (r = 1, 2, …, n; j = 1, 2, …, k). N = nk
n k n k

 y jr  Y jr
j 1 r 1
y
j 1
j
r 1
yj  , Y  , the mean of sample means
n N k

 (Y jr  Y )2  (Y jr  Y )2
S2  
N 1 nk  1

3
k

(y j  Y )2
j 1
Variance of the mean: S a2 
k 1
Like cluster sampling, the parameter  w is the correlation coefficient between pairs of units that
are in the same systematic sample. The variance of the means in terms of this  w would be
2 S 2 ( N  1)  w n  1  1
S 
a . In systematic sampling, the estimated mean denoted by y sy is
n 2 k  1
simply one of the y j ’s, depending on which random number was chosen to start sampling.
n

y
r 1
jr
Therefore, y sy = = yj .
n

Theorem 8.1: Let yjr denotes the rth element of the jth systematic sample with N = nk, where j =
1, 2, …, k; r = 1, 2, …, n. Then the mean of systematic sample, y sy , is an unbiased estimate of
Y , and its variance is
k k

 y Y 
2
(y j  Y )2 j
j 1 k 1 2 j 1
V ( y sy )   S a , where S a2  . Prove this theorem.
k k k 1

Corollary: Expressing the variance of the mean of a systematic sample in terms of  w is:
k  1 2 N  1 S 2  w n  1  1 S 2  w n  1  1
V ( y sy )  Sa   , for large N.
k N n n

8.3 Comparison With SRS

The mean of a systematic sample is more precise than the mean of a SRS when  w is small and
1
negative. Show that  w   if V ( y sy )  V( y ).
N 1
1
 If  w   , then y sy is more precise. This can occur when the populations are
N 1
ordered with respect to the study variable.
1
 If  w =  or  w = 0, for large N, then they are equally efficient. This can occur when
N 1
the population’s elements are arranged in random order with respect to the variable under
the study.
1
 If  w >  or  w > 0, for large N, then y sy is less efficient. This can happen when the
N 1
population’s elements are arranged in a list demonstrating a high degree of periodicity.

8.4 Estimation of the Variance from a Single Sample

Consider a case where the population is in “random order”. That is, there is no trend or
stratification in the study variable, and no correlation between neighboring values. In this case
4
systematic sampling is equivalent to simple random sampling, E(Vsy) = Vran. Then the sample
n

(y jr  y j )2
1 f 2
variance is v( y sy )  s y j , where s 2y j  r 1
, for sample j and srs.
n n 1

8.5 Estimation of Mean/ Total from Repeated Systematic Sampling (Sub-samples)

Suppose that the required sample size n is selected from N in the form of two or more (say m)
systematic sub-samples of the same size with independent random start. Assume that these m
sub-samples, each of size n = n/m, are to be selected from the population of N units. Also let
N/n = k. Then, select m random start using srs wr or wor, for each m sub-samples, from 1 to k,
where k = mk.
Let y1 , y 2 , . . . , y m be the estimators of the population mean based on m such sub-samples, each
n

y
j 1
ij

of size n/m, i.e., y i  mean for each sample i.


n
For srs without replacement,
m

y
i 1
i
i) y sy = , i = 1,2, - - -, m, which is unbiased estimator of population mean Y .
m
k 1 k
ii) V( y sy ) =  ( y i  Y ) 2 , i =1,2, - - -, k, with its unbiased estimator given by
km(km  1) i 1
m
k 1
v( y sy ) =  ( yi  y sy ) 2
km(m  1) i 1
Example:
A dairy research institute is interested in estimating the total milk yield of a cow in connection
with a breading program. The total lactation period was taken as 300 days. It was decided to
select 3 systematic sub-samples without replacement, each of size 10 days, so to arrive at a total
sample size of 30 days. Milk yield recorded for the selected days is given below. The random
start used for each sub-sample was 1, 12, and 26.

Sub-sample I Sub-sample II Sub-sample III


Selected days Milk yield Selected days Milk yield Selected days Milk yield
1 8.10 12 9.30 26 11.15
31 12.00 42 13.50 56 14.70
61 15.20 72 14.40 86 14.60
91 14.00 102 14.35 116 12.80
121 11.25 132 9.80 146 10.65
151 10.10 162 10.00 176 10.60
181 9.80 192 8.60 206 8.30
211 8.75 222 8.40 236 7.50
241 7.25 252 6.10 266 4.30
271 4.10 282 3.10 296 2.20
Mean 10.055 9.755 9.680

5
Estimate the total milk (in liters) and estimate the variance.
N = 300, n = 30, m = 3, k = N/n = 300/ 30 = 10
Select n = n/m = 30/3 = 10 from 300 with interval of k= m k = 3x10 = 30,
N m 300
Yˆsy  N y sy   y i  (10.055  9.755  9.680)  2949 liters
m i 1 3

2 m
N (k  1)
v( Yˆsy ) =  ( y i  y sy ) 2
km(m  1) i 1
(300) 2 (10  1)
v( Yˆ ) 
sy 10 x3 x 2
 
10.055  9.832  9.755  9.832  9.68  9.832 = 1063.125

8.6 Population with Linear Trend and periodicity

a) Linear Trend

Suppose a population consists of linear trend that could be expressed in the form of
yi = i. That is, y1 = 1, y2 = 2, y3 = 3, - - -, yi = i, . . ., yN = N, for population of size N.
N N N N
N ( N  1) N ( N  1)(2 N  1)
 yi   i  ,  yi   i 2 
2

i 1 i 1 2 i 1 i 1 6
2 2
2 1  N 2 1  N   1  N ( N  1)(2 N  1) 1  N ( N  1)  
V(y) = S =  yi  N  
N  1  i 1 i 1
yi  =
  N  1  6
 
N 2
 
 
N ( N  1)
S2 =
12
Substituting this value into the V( y ) gives
S 2 N  n ( k  1)( N  1) ( k  1)( nk  1)
V( y ) = = 
n N 12 12
Similarly, the means of systematic samples, y j , is in increasing order starting from 1, and with
value of 1 unit apart from each other. That is, y1  1, y 2  2, - - -, y j  j , for j = 1, 2, - - -, k and
for k possible systematic samples.
2
k
1
V( y sy ) =   y j  Y 
k j 1
2
1 k 1 k   1  k (k  1)(2k  1) 1  k (k  1)  2  k 2  1
V( y sy ) =   y 2j    y j  =      
k  j 1 k  i 1   k 6 k  2   12
 
k 2  1 ( k  1)( nk  1)
V( y sy )  V( y ) if and only if  . If n = 1, they are equal, otherwise
12 12
systematic sample is much more effective.

b) Periodicity:

It is an arrangement or occurrence of elements of population in certain pattern in a list or in a


natural record, so that a value at certain interval repeats itself or repeats in an integral multiple
values.
6
Consider an artificial example consisting of 12 students in a given class with their academic
performance graded as A, B, C, and D. Suppose that the grade is listed according to A B C D A
B C D A B C D. A sample of three students was taken from the list to assess their performance.
Since k = N/n = 12/3 = 4, we can take systematic sample 1 in 4 from the list. There are four
possible samples, i.e., when j = 1, 2, 3, 4.

j=1 value j=2 value j=3 value j=4 value


1 A 2 B 3 C 4 D
5 A 6 B 7 C 8 D
9 A 10 B 11 C 12 D

This shows that all students with similar performance are selected in one sample, providing large
differences between the sample means. In this situation  w is very high which implies low
precision and can be calculated if grades are converted to numeric values. To overcome this
periodicity problem, you must arrange the grades in increasing or decreasing order before sample
selection is carried out.

8.7 Advantages and Disadvantages of Systematic Sampling:

Systematic sampling has the following advantages over simple random sampling.

 It is easy and simple to apply; the sample is easier to draw since only one random number is
required; costs may be lower per unit to sample.
 From operational point of view, it is a convenient method for the selection of sampling units
in the field and can easily be taught to individuals who have little training in survey
methodology. It also allows easy checking of the selection.
 Systematic sampling tends to distribute the sample over the listed population in a more
evenly way.
 Higher precision is often associated with systematic sampling, especially if the arrangement
of the units in the list is related to the characteristics of interest. It has an effect similar to
stratification and is often called “implicit stratification”.
 It provides more efficient estimates of population mean or total in comparison to simple
random sample for populations with linear trend.

Disadvantage:
 A serious disadvantage of this method lies in its use with populations having unforeseen
periodicity, which may substantially contribute to the bias in the estimate of mean or total. In
this case  w > 0, and for large N, V( y sy ) > V( y SRS ).

You might also like