Screenshot 2022-12-30 at 4.09.13 PM

Sampling Theory
MODULE I
LECTURE - 1
INTRODUCTION
DR. SHALABH
DEPARTMENT OF MATHEMATICS AND STATISTICS
INDIAN INSTITUTE OF TECHNOLOGY KANPUR
1
Statistics is the science of data.
Data are numerical values containing some information.
Statistical tools can be used on a data set to draw statistical inferences. These statistical inferences are in turn used
for various purposes. For example, government uses such data for policy formulation for the welfare of the people,
marketing companies use the data from consumer surveys to improve the company and to provide better
services to the customer, etc. Such data is obtained through sample surveys. Sample surveys are conducted
throughout the world by governmental as well as non-governmental agencies. For example, “National Sample
Survey Organization (NSSO)” conducts surveys in India, “Statistics Canada” conducts surveys in Canada, agencies
of United Nations like “World Health Organization (WHO), “Food and Agricultural Organization (FAO)” etc. conduct
surveys in different countries.
Sampling theory provides the tools and techniques for data collection keeping in mind the objectives to be
fulfilled and nature of population.
There are two ways of obtaining the information
1. Sample surveys
2. Complete enumeration or census
Sample surveys collect information on a fraction of total population whereas in census, the information is collected
on the whole population. Some surveys, e.g. economic surveys, agricultural surveys etc. are conducted regularly.
Some surveys are need based and are conducted when some need arises, e.g., consumer satisfaction surveys at a
newly opened shopping mall to see the satisfaction level with the amenities provided in the mall .
2
Sampling unit:
An element or a group of elements on which observations can be taken is called a sampling unit. The objective
of the survey helps in determining the definition of sampling unit.
For example, if the objective is to determine the total income of all the persons in the household, then the
sampling unit is household. If the objective is to determine the income of any particular person in the household,
then the sampling unit is the income of the particular person in the household. So the definition of sampling unit
depends and varies as per the objective of the survey. Similarly, in another example, if the objective is to study
the blood sugar level, then the sampling unit is the value of blood sugar level of a person. On the other hand, if
the objective is to study the health conditions, then the sampling unit is the person on whom the readings on the
blood sugar level, blood pressure and other factors will be obtained. These values will together classify the
person as healthy or unhealthy.
Population:
Collection of all the sampling units in a given region at a particular point of time or a particular period is called
.
population. For example, if the medical facilities in a hospital are to be surveyed through the patients, then the
total number of patients registered in the hospital during the time period of survey will be the population.
Similarly, if the production of wheat in a district is to be studied, then all the fields cultivating wheat in that district
will constitute the population. The total number of sampling units in the population is the population size,
denoted generally by N. The population size can be finite or infinite (N is large).
3
Census:
Complete count of population is called census. The observations on all the sampling units in the population
are collected in a census. For example, in India, the census is conducted at every tenth year in which
observations on all the persons staying in India is collected.
Sample:
.
One or more sampling units are selected from the population according to some specified procedure.
A sample consists only of a portion of the population units.
In the context of sample surveys, a collection of units like households, people, cities, countries etc. is called a
.
finite population.
A census is a 100% sample and it is a complete count of the population.
.
4
Representative sample:
All salient features of population are present in the sample.
It goes without saying that every sample is considered as a representative sample.
For example, if a population has 30% males and 70% females, then we also expect the sample to have
nearly 30% males and 70% females.
In another example, if we take out a handful of wheat from a 100 Kg. bag of wheat, we expect the same
quality of wheat in hand as inside the bag. Similarly, it is expected that a drop of blood will give the same
information as all the blood in the body.
.
Sampling frame:
List of all the units of the population to be surveyed constitutes the sampling frame. All the sampling units in
the sampling frame have identification particulars. For example, all the students in a particular university listed
along with their roll numbers constitute the sampling frame. Similarly, the list of households with the name of
head of family or house address constitutes the sampling frame. In another example, the residents of a city
area may be listed in more than one frame - as per automobile registration as well as the listing in the
telephone directory.
5
Ways to ensure representativeness:
There are two possible ways to ensure that the selected sample is representative.
1. Random sample or probability sample:
The selection of units in the sample from a population is governed by the laws of chance or probability.
The probability of selection of a unit can be equal as well as unequal.
.
2. Non-random sample or purposive sample:
The selection of units in the sample from population is not governed by the probability laws.
For example, the units are selected on the basis of personal judgment of the surveyor. The persons
volunteering to take some medical test or to drink a new type of coffee also constitute the sample on non-
random laws.
.
Another type of sampling is Quota Sampling. The survey in this case is continued until a predetermined
number of units with the characteristic under study are picked up.
For example, in order to conduct an experiment for rare type of disease, the survey is continued till the
required number of patients with disease are collected.
.
6
Advantages of sampling over complete enumeration:
Sampling involves the collection of data on smaller number of units in

Reduced cost and comparison to complete enumeration, so the cost involved in the collection of
enlarged scope information is reduced. Further, additional information can be obtained at little
cost in comparison to conducting another survey. For example, when an
interviewer is collecting information on health conditions, then he/she can also
ask some questions on health practices. This will provide additional
information on health practices and the cost involved will be much less than
conducting an entirely new survey on health practices.
Organization of
work It is easier to manage the organization of collection of smaller number of units
than all the units in a census. For example, in order to draw a representative
sample from a state, it is easier to manage to draw small samples from every
city than drawing the sample from the whole state at a time. This ultimately
results in more accuracy in the statistical inferences because better
organization provides better data and in turn, improved statistical inferences
are obtained.
7
Greater The persons involved in the collection of data are trained personals. They can
accuracy
collect the data more accurately if they have to collect smaller number of units
than large number of unites in a given time.
Urgent
information The data from a sample can be quickly summarized.
required
For example, the forecasting of the crop production can be done quickly on
the basis of a sample of data than collecting first all the observations.
Conducting the experiment on smaller number of units, particularly when the

Feasibility units are destroyed, is more feasible.
For example, in determining the life of bulbs, it is more feasible to fuse

minimum number of bulbs. Similarly, in any medical experiment, it is more
feasible to use less number of animals.
8
Sampling Theory
MODULE I
LECTURE - 2
INTRODUCTION
DR. SHALABH
1
Type of Surveys:
1. Demographic surveys
These surveys are conducted to collect the demographic data, e.g., household surveys, family size, number
of males in families, etc.
Such surveys are useful in the policy formulation for any city, state or country for the welfare of the people.
2. Educational surveys
These surveys are conducted to collect the educational data, e.g., how many children go to school, how
many persons are graduate, etc.
Such surveys are conducted to examine the educational programs in schools and colleges. Generally,
schools are selected first and then the students from each school constitue the sample.
3. Economic surveys
These surveys are conducted to collect the economic data, e.g., data related to export and import of goods,
industrial production, consumer expenditure etc. Such data is helpful in constructing the indices indicating the
growth in a particular sector of economy or even the overall economic growth of the country.
2
4. Employment surveys
These surveys are conducted to collect the employment related data, e.g., employment rate, labour
conditions, wages, etc. in a city, state or country. Such data helps in constructing various indices to know
the employment conditions among the people.
5. Health and nutrition surveys
These surveys are conducted to collect the data related to health and nutrition issues, e.g., number of visits to
doctors, food given to children, nutritional value etc. Such surveys are conducted in cities, states as well as
countries by national and international organizations like UNICEF, WHO etc.
6. Agricultural surveys
These surveys are conducted to collect the agriculture related data to estimate, e.g., the acreage and
production of crops, livestock numbers, use of fertilizers, use of pesticides and other related topics. The
government bases its planning related to the food issues for the people based on such surveys.
3
7. Marketing surveys
These surveys are conducted to collect the data related to marketing. They are conducted by major
companies, manufacturers or those who provide services to consumer etc. Such data is used for knowing
the satisfaction and opinion of consumers as well as in developing the sales, purchase and promotional
activities etc.
8. Election surveys
These surveys are conducted to study the outcome of an election or a poll. For example, such polls are
conducted in democratic countries to have the opinions of people about any candidate who is contesting
the election.
9. Public polls and surveys
These surveys are conducted to collect the public opinion on any particular issue. Such surveys are
generally conducted by news media and agencies which conduct polls and surveys on current topics of
interest to public.
10. Campus surveys
These surveys are conducted on the students of any educational institution to study about the educational
programs, living facilities, dining facilities, sports activities, etc.
4
Principal steps in a sample survey:
The broad steps to conduct any sample survey are as follows:
1. Objective of the survey:
The objective of the survey has to be clearly defined and well understood by the person planning to
conduct it. It is expected from the statistician to be well versed with the issues to be addressed in
consultation with the person who wants to get the survey conducted. In complex surveys, sometimes the
objective is forgotten and data is collected on those issues which are far away from the objectives.
2. Population to be sampled:
.
Based on the objectives of the survey, decide the population from which the information can be obtained.
For example, population of farmers is to be sampled for an agricultural survey whereas the population of
patients has to be sampled for determining the medical facilities in a hospital.
3. Data to be collected:
.
It is important to decide that which data is relevant for fulfilling the objectives of the survey and to note that
no essential data is omitted. Sometimes, too many questions are asked and some of their outcomes are
never utilized. This lowers the quality of the responses and in turn results in lower efficiency in statistical
inferences.
5
4. Degree of precision required:
The results of any sample survey are always subjected to some uncertainty. Such uncertainty can be
reduced by taking larger samples or using superior instruments. This involves more cost and more time. So,
it is very important to decide about the required degree of precision in data. This needs to be conveyed to
the surveyor also.
5. Method of measurement:
The choice of measuring instrument and method to measure the data from the population needs to be
specified clearly. For example, the data has to be collected through interview, questionnaire, personal visit,
combination of any of these approaches, etc. The forms in which the data is to be recorded so that the data
can be transferred to mechanical equipment for easily creating the data summary etc. is also needed to be
prepared accordingly.
6. The frame:
The sampling frame has to be clearly specified. The population is divided into sampling units such that the
units cover the whole population and every sampling unit is tagged with identification. The list of all
sampling units is called the frame. The frame must cover the whole population and the units must not
overlap each other in the sense that every element in the population must belong to one and only one unit.
For example, the sampling unit can be an individual member in the family or the whole family.
6
7. Selection of sample:
The size of the sample needs to be specified for the given sampling plan. This helps in determining and
comparing the relative cost and time of different sampling plans. The method and plan adopted for drawing
a representative sample should also be detailed.
8. The Pre-test:
It is advised to try the questionnaire and field methods on a small scale . This may reveal some troubles and
problems beforehand which the surveyor may face in field in large scale surveys .
9. Organization of the field work:
How to conduct the survey, how to handle business administrative issues, providing proper training to
surveyors, procedures, plans for handling the nonresponse and missing observations etc. are some of the
issues which need to be addressed for organizing the survey work in the fields. The procedure for early
checking of the quality of return should be prescribed. It should be clarified how to handle the situation when
the respondent is not available.
7
10. Summary and analysis of data:
It is to be noted that based on the objectives of the data, the suitable statistical tool is decided which can
answer the relevant questions. In order to use the statistical tool, a valid data set is required and this dictates
the choice of responses to be obtained for the questions in the questionnaire, e.g., the data has to be
qualitative, quantitative, nominal, ordinal etc. After getting the completed questionnaire back, it needs to be
edited to amend the recording errors and delete the erroneous data. The tabulating procedures, methods of
estimation and tolerable amount of error in the estimation needs to be decided before the start of survey.
Different methods of estimation may be available to get the answer of the same query from the same data set.
So the data needs to be collected which is compatible with the chosen estimation procedure.
11. Information gained for future surveys:
The completed surveys work as guide for improved sample surveys in future. Beside this they also supply
various types of prior information required to use various statistical tools, e.g., mean, variance, nature of
variability, cost involved etc. Any completed sample survey acts as a potential guide for the surveys to be
conducted in the future. It is generally seen that the things always do not go in the same way in any complex
survey as planned earlier. Such precautions and alerts help in avoiding the mistakes in the execution of
future surveys.
8
.
Variability control in sample surveys:
The variability control is an important issue in any statistical analysis. A general objective is to draw statistical
inferences with minimum variability. There are various types of sampling schemes which are adopted in different
conditions. These schemes help in controlling the variability at different stages. Such sampling schemes can be
classified in the following way.
1. Before selection of sampling units

.
• Stratified sampling
• Cluster sampling
• Two stage sampling
• Double sampling etc.
2. At the time of selection of sampling units

• Systematic sampling
• Varying probability sampling
3. After the selection of sampling units

• Ratio method of estimation
• Regression method of estimation
Note that the ratio and regtresion methods are the methods of estimation and not the methods of drawing samples.
9
Methods of data collection
1. Physical observations and measurements:
The surveyor contacts the respondent personally through meeting. He observes the sampling unit and records the
data. The surveyor can always use his prior experience to collect the data in a better way. For example, a young man
telling his age as 60 years can easily be observed and corrected by the surveyor.
2. Personal interview:
The surveyor is supplied with a well prepared questionnaire. The surveyor goes to the respondents and asks the
same questions mentioned in the questionnaire. The data in the questionnaire is then filled up accordingly based on
the responses from the respondents.
3. Mail enquiry:
The well prepared questionnaire is sent to the respondents through postal mail, e-mail, etc. The respondents are
requested to fill up the questionnaires and send it back. In case of postal mail, many times the questionnaires are
accompanied by a self addressed envelope with postage stamps to avoid any non-response due to the cost of
postage.
10
4. Web based enquiry:
The survey is conducted online through internet based web pages. There are various websites which provide such
facility. The questionnaires are to be in their formats and the link is sent to the respondents through email. By
clicking on the link, the respondent is brought to the concerned website and the answers are to be given online.
These answers are recorded and responses as well as their statistics is sent to the surveyor. The respondents
should have internet connection to support the data collection with this procedure.
5. Registration:
The respondent is required to register the data at some designated place. For example, the number of births and
deaths along with the details provided by the family members are recorded at city municipal office which are
provided by the family members.
6. Transcription from records:

The sample of data is collected from the already recorded information . For example, the details of the number of
persons in different families or number of births/deaths in a city can be obtained from the city municipal office
directly. The methods in (1) to (5) provide primary data which means collecting the data directly from the source.
The method in (6) provides the secondary data which means getting the data from the primary sources.
11
Sampling Theory
MODULE II
LECTURE - 3
SIMPLE RANDOM SAMPLING
DR. SHALABH
1
Simple random sampling
Simple random sampling (SRS) is a method of selection of a sample comprising of n number of sampling units from
the population having N number of units such that every sampling unit has an equal chance of being chosen.
 The samples can be drawn in two possible ways.
 The sampling units are chosen without replacement in the sense that the units once chosen are not
placed back in the population .
 The sampling units are chosen with replacement in the sense that the chosen units are placed back
in the population.
Based on these two concepts, there are two approaches for SRS:
1. Simple random sampling without replacement (SRSWOR)

SRSWOR is a method of selection of n units out of the N units one by one such that at any stage of selection,
anyone of the remaining units have same chance of being selected, i.e., 1/N.
2. Simple random sampling with replacement (SRSWR)

SRSWR is a method of selection of n units out of the N units one by one such that at each stage of selection each
unit has equal chance of being selected, i.e., 1/N.
2
Procedure of selection of a random sample
Suppose there are N units in the population out of which n units are to be selected.
1. Identify the N units in the population with the numbers 1 to N.

2. Choose any random number arbitrarily from the random numbers table and start reading numbers.
3. Choose the sampling unit whose serial number corresponds to the random number drawn from the table of
random numbers.
4. In case of SRSWR, all the random numbers are accepted even if repeated more than once.
5. In case of SRSWOR, if any random number is repeated, then it is ignored and more numbers are drawn.
Such process can be implemented through programming and using the discrete uniform distribution. Any number
between 1 and N can be generated from this distribution and corresponding unit can be seleced into the sample by
associating an index with each sampling unit. Many statistical softwares like R, SAS, etc. have inbuilt functions for
drawing a sample using SRSWOR or SRSWR.
3
The following notations will be used in further notes:
N : Number of sampling units in the population (Population size).

n : Number of sampling units in the sample (Sample size)
Y : The characteristic under consideration
Yi : Value of the characteristic for the ith unit of the population
1 N
Y =
N
∑Y i
: population mean
i =1
1 n
y=
n
∑y i
: sample mean
i =1
1 N
1  N 2 2 
∑ (Y − Y )  ∑ Yi − NY 
2
S2 = i
=
N −1 i =1 N − 1  i =1 
1 N
1  N 2 2 
σ2 =
N
∑ (Y − Y ) i
2
=  ∑ Yi − NY 
N  i =1 
i =1
1 n
1  n 2 2 
s2 =
n −1
∑ ( yi − y ) 2 =  ∑ yi − ny 
n − 1  i =1 
i =1
4
Probability of drawing a sample
1. SRSWOR
If n units are selected by SRSWOR, the total number of possible samples are :  
N
 
n
1
So the probability of selecting any one of these samples is:  N 
 
n
Alternatively, if u1 , u2 ,..., un are the units selected in the sample, then
P (u1 , u2 ,..., un ) = P (u1 ).P (u2 ),..., P (un )
To compute this expression, consider the probability that a specified unit, say ith unit ui is included in the sample.
The ith unit can be selected either at first draw, second draw, …, or nth draw. Thus the required probability is
P1 (i ) + P2 (i ) + ... + Pn (i )
1 1 1
= + + ... + (n times )
N N N
n
=
N
where Pj (i ) denotes the probability of selection of ui at jth draw, j = 1,2,...,n.
5
n
If P(u1 ) = , then
N
n −1
P(u2 ) = ,
N −1
,
1
P(un ) = .
N − n +1
Thus
n n −1 n − 2 1
P(u1 , u2 ,..., un ) = . . ...
N N −1 N − 2 N − n +1
1
= .
 
N
 
n
2. SRSWR
n
When n units are selected with SRSWR, the total number of possible samples are N . The probability of drawing a
1
sample is .
Nn
Alternatively,
P(u1 , u2 ,..., un ) = P(u1 ) . P(u2 )...P(un )

1 1 1
= . ...
N N N
1
= n.
N
6
Probability of drawing an unit
1. SRSWOR
Let Al denotes an event that a particular unit uj is not selected at the lth draw. The probability of selecting, say, jth
unit at kth draw is
P (selection of u j at k draw ) = P ( A1  A2  ...  Ak −1  Ak )

th
= P ( A1 ) P ( A2 / A1 ) P ( A3 / A1 A2 )...P ( Ak −1 / A1 , A2 ... Ak − 2 ) P ( Ak / A1 A2 ... Ak −1 )

= 1 −
1   1 − 1   1 − 1  ...  1 − 1  1
    
 N  N −1  N −2  N − k + 2  N − k +1
N −1 N −2 N − k +1 1
= . ... .
N N −1 N −k +2 N − k +1
1
= .
N
2. SRSWR
1
P(selection of u j at k th draw ) =
N 7
Sampling Theory
MODULE II
LECTURE - 4
DR. SHALABH
1
Estimation of population mean and population variance
One of the main objectives after the selection of a sample is to know about the tendency of the data to cluster
around the central value and the scatterdness of the data around the central value.
Among various indicators of central tendency and dispersion, the popular choices are arithmetic mean and
variance. So the population mean and population variability are generally measured by arithmetic mean (or
weighted arithmetic mean) and variance.
There are various popular estimators for estimating the population mean and population variance. Among
them, sample arithmetic mean and sample variance are more popular than other estimators .
One of the reason to use these estimators is that they possess nice statistical properties. Moreover, they are
also obtained through well established statistical estimation procedures like maximum likelihood estimation,
least squares estimation, method of moments etc. under several standard statistical distributions.
One may also consider other indicators like median, mode, geometric mean, harmonic mean for measuring the
central tendency and mean deviation, absolute deviation, Pitman nearness etc. for measuring the dispersion.
The properties of such estimators can be studied by numerical procedures like bootstraping.
2
1. Estimation of population mean
n
1 1 N
Let us consider the sample arithmetic mean y =
n
∑y
i =1
i as an estimator of population mean Y =
N
∑Y i
i =1
and verify if y is an unbiased estimator of Y under the two cases .
• SRSWOR
 N 
  
n

1   1
n
1 1 n
E( y ) = E  ∑ yi  = E ( ti ) = ∑ ti  ( where ti = ∑ yi )
n  i =1  n nN i =1
 i =1
  
 n  
N
 
n
1 1  n

= .
n N
∑  ∑ y .
i =1 i =1
i
 
n
When n units are sampled from N units by without replacement, then each unit of the population can occur
with (n - 1) other units selected out of the remaining (N - 1) units in the population and each unit occurs in
 N − 1 of the  N  possible samples. So

   
 n −1  n
N
 
n
 n
  N − 1 N
so
∑  ∑ y  =  n − 1  ∑ y
i i
i =1 i =1   i =1
( N − 1)! n !( N − n)! N
1 N
Now E( y ) =
(n − 1)!( N − n)!
.
nN!
∑y
i =1
i =
N
∑y
i =1
i =Y.
Thus y is an unbiased estimator of Y .

3
Alternatively, the following approach can also be adopted to show that the sample mean is an unbiased estimator
of population mean
n
1
E( y ) =
n
∑ E( y )
j =1
j
1 n
N 
=
n
∑  ∑ Yi Pj (i ) 
j =1  i =1 
1 n
N 1
=
n
∑  ∑ Yi . N 
j =1  i =1 
n
1
=
n
∑Y
j =1
=Y
where Pj (i ) denotes the probability of selection of ith unit at jth stage.
• SRSWR
1  n 
E( y ) = E ∑ yi
n  i =1 
1 n 1 n
1 n
= ∑
n i =1
E ( yi ) =
n
∑ (Y P + ... + Y
i =1
1 1 N PN ) = ∑Y
n i =1
=Y
1
where Pi = for all i = 1, 2,..., N is the probability of selection of a unit. Thus y is an unbiased estimator of
N
population mean under SRSWR also.
4
Variance of the estimate
Assume that each observation has same variance σ 2 . Then
V ( y ) = E ( y − Y )2
2
1 n 
= E  ∑ ( yi − Y ) 
 n i =1 
1 n
1 n n 
= E  2 ∑ ( yi − Y ) 2 + 2 ∑∑ ( yi − Y )( y j − Y ) 
 n i =1 n i ≠j 
n n n
1 1
= 2 ∑ E ( yi − Y ) 2 + 2 ∑∑ E ( yi − Y )( y j − Y )
n i =1 n i ≠j
1 n 2 K
= ∑σ +
n 2 i =1 n2
N −1 2 K
= S +
Nn n2
where
n n
K = ∑∑ E ( yi − Y )( y j − Y ).
i ≠j
Now we find the value of K under the setups of SRSWR and SRSWOR.
5
• SRSWOR
n n
K = ∑∑ E ( yi − Y )( y j − Y )
i ≠j
Consider
N N
1
E ( yi − Y )( y j − Y ) = ∑∑
N ( N − 1) k ≠l
( yk − Y )( yl − Y ).
Since
2
N  N N N
 ∑ ( yk − Y )  = ∑ ( yk − Y ) + ∑∑ ( yk − Y )( yl − Y )
2
 k =1  k =1 k ≠l
N N
0 = ( N − 1) S 2 + ∑∑ ( yk − Y )( yl − Y )
k ≠l
N N
1
∑∑ ( y
k ≠l
k − Y )( yl − Y ) =
N ( N − 1)
 −( N − 1) S 2 
S2
=− .
N
Thus
S2
K = − n(n − 1)
N
6
and substituting the value of K, the variance of y under SRSWOR is
N −1 2 1 S2
V ( yWOR ) = S − 2 n(n − 1)
Nn n N
N −n 2
= S .
Nn
• SRSWR
Now we obtain the value of K under SRSWR.

n n
K = ∑∑ E ( yi − Y )( y j − Y )
i ≠j
n n
= ∑∑ E ( y − Y ) E ( y
i ≠j
i j −Y)
=0
because ith and jth draws (i ≠ j ) are independent. Thus the variance of y under SRSWR is
N −1 2
V ( yWR ) = S .
Nn
7
It is to be noted that if N is infinite (large enough), then
S2
V ( y) =
n
N −n
in both the cases of SRSWOR and SRSWR. So the factor is responsible for changing the variance of y when
N
N −n
the sample is drawn from a finite population in comparison to an infinite population. This is why is called as
N
finite population correction (fpc) .
N −n n N −n n
It may be noted that = 1 − , so is close to 1 if the ratio of sample size to population size  
N N N N
is very small or negligible. In such a case, the size of the population has no direct effect on the variance of y .
n
The term is called as sampling fraction.
N
n
In practice, fpc can be ignored whenever < 5% and for many purposes even if it is as high as 10%.
N
Ignoring fpc will result in the over estimation of variance of y .

8
Efficiency of y under SRSWOR over SRSWR:
Now we compare the variances of sample means under SRSWOR and SRSWR.
N −n 2
V ( y )WOR = S
Nn
N −1 2
V ( y )WR = S
Nn
N − n 2 n −1 2
= S + S
Nn Nn
= V ( y )WOR + a positive quantity
V ( y )WR > V ( y )WOR

and so, SRSWOR is more efficient than SRSWR.
9
Estimation of variance from a sample
Since the expressions of variances of sample mean involve S2 which is based on population values, so these
expressions can not be used in real life applications. In order to estimate the variance of y on the basis of a sample, an
estimator of S2 (or equivalently σ ) is needed. Consider s2 as an estimator of S2 (or σ 2 ) and we investigate its
2
biasedness for S2 in the cases of SRSWOR and SRSWR.
Consider
n
1
s2 =
n −1
∑ (y
i =1
i − y )2
n 2
1
=
n −1
∑
i =1
( yi − Y ) − ( y − Y ) 
1  n 
=  ∑
n − 1  i =1
( yi − Y ) 2 − n ( y − Y ) 2 

1  n 
E (s2 ) =  ∑
n − 1  i =1
E ( yi − Y )2 − nE ( y − Y )2 

1  n

=  ∑
n − 1  i =1
Var ( yi ) − n Var ( y ) 

1
=  nσ 2 − nVar ( y ) 
n −1
10
In case of SRSWOR
N −n 2
Var ( y )WOR = S
Nn
n 2 N −n 2
σ − Nn S 
and so E ( s 2 ) =
n −1
n  N −1 2 N −n 2
=  S − S 
n −1  N Nn 
=S 2
In case of SRSWR
N −1 2
Var ( y )WR = S
Nn
n  2 N −1 2 
and so E (s 2 ) =
n −1σ − Nn S 
n  N −1 2 N −1 2 
= S − S 
n − 1  N Nn 
N −1 2
= S
N
=σ2
 S 2 in SRSWOR
Hence E ( s 2 ) = 
σ in SRSWR
2
11
An unbiased estimate of Var ( y ) is
( y) N −n 2
Var WOR = s in case of SRSWOR.
Nn
 ( y ) = N − 1 . N s2
Var WR
Nn N − 1
2
s
= in case of SRSWR.
n
12
Sampling Theory
MODULE II
LECTURE - 5
DR. SHALABH
1
Standard Errors
The standard error of y is defined as Var ( y )
In order to estimate the standard error, one simple option is to consider the square root of estimate of variance of sample
mean.
• under SRSWOR, a possible estimator is
N −n
σˆ ( y ) = s
Nn
• under SRSWR, a possible estimator is
N −1
σˆ ( y ) = s.
Nn
It is to be noted that this estimator does not possess the same properties as of ( y)
Var
θˆ θ 
Reason being if is an estimator of , then θ is not necessarily an estimator of θ .
In fact, the σˆ ( y ) is a negatively biased estimator under SRSWOR.

2
The approximate expressions for large N are as follows:
(Reference: Sampling Theory of Surveys with Applications, P.V. Sukhatme, B.V. Sukhatme, S. Sukhatme, C. Asok, Iowa
State University Press and Indian Society of Agricultural Statistics, 1984, India)
Consider s as an estimator of S .
Let
s 2 = S 2 + ε with E (ε ) = 0, E (ε 2 ) = S 2 .
Write
s = ( S 2 + ε )1/ 2
1/ 2
 ε 
= S 1 + 2 
 S 
 ε ε2 
= S 1 + 2 − 4 + ... 
 2S 8S 
assuming ε will be small as compared to S2 and as n becomes large, the probability of such an event approaches
one.
Neglecting the powers of ε higher than two and taking expectation, we have
3
 Var ( s 2 ) 
E ( s )  1 − S
 8S 4 
where
2S 4   n − 1  
Var ( s 2 ) = 1+   ( β 2 − 3)  for large N .
(n − 1)   2n  
N
1
µj =
N
∑ (Y − Y )
i =1
i
j
µ4
β2 = : coefficient of kurtosis.
S4
Thus
 1 β − 3
E ( s )  S 1 − − 2
 4(n − 1) 8n 
2
 1 Var ( s 2 ) 
Var ( s )  S 2 − S 2 1 − 4 
 8 S 
Var ( s 2 )
=
4S 2
S 2   n −1  
= 1+   ( β 2 − 3)  .
2(n − 1)   2n  
4
Note that for a normal distribution, β 2 = 3 and we obtain
S2
Var ( s ) = .
2(n − 1)
Both Var ( s ) and Var ( s 2 ) are inflated due to nonnormality to the same extent, by the inflation factor
  n −1  
1 +  2n  ( β 2 − 3) 
   
and this does not depends on coefficient of skewness.
This is an important result to be kept in mind while determining the sample size in which it
is assumed that S 2 is known. If inflation factor is ignored and population is non-normal, then
the reliability on s 2 may be misleading.
5
Alternative approach:
The results for the unbiasedness property and variance of sample mean can also be proved in an alternative way as
follows:
(i) SRSWOR
With the ith unit of the population, we associate a random variable ai defined as follows:

1 if the i th unit occurs in the sample
ai = 

0 if the i th unit does not occurs in the sample (i = 1,2,..., N ).
Then,
E (ai ) = 1× Probability that the i th unit is included in the sample

n
= , i =1, 2,..., N .
N
E (ai2 ) = 1× Probability that the i th unit is included in the sample
n
= , i =1, 2,..., N
N
E (ai a j ) = 1× Probability that the i th and j th units are included in the sample
n(n − 1)
= , i ≠ j = 1, 2,..., N .
N ( N − 1) 6
From these results, we can obtain
n( N − n )
Var ( ai ) = E ( ai2 ) − ( E ( ai ) ) =
2
, i = 1, 2,..., N
N2
n( N − n )
Cov ( ai , a j ) = E ( ai a j ) − E ( ai ) E ( a j ) = 2 , i ≠ j = 1, 2,..., N .
N ( N − 1)
We can rewrite the sample mean as
1 N
y= ∑ ai yi
n i =1
Then
1 N
E( y ) = ∑ E (ai ) yi = Y
n i =1
and
1  N  1 N N

Var ( y ) = 2
Var  ∑ ai yi  = 2  ∑Var ( ai ) yi2 + ∑ Cov ( ai , a j ) yi y j  .
n  i =1  n  i =1 i≠ j 
Substituting the values of Var (ai ) and Cov(ai , a j ) in the expression of Var ( y ) and simplifying, we get
N −n 2
Var ( y ) = S .
Nn
To show that E ( s 2 ) = S 2 , consider
1  n 2 2 1 N 
s2 =  ∑
(n − 1)  i =1
yi − ny  =  ∑
 (n − 1)  i =1
ai yi2 − ny 2  .

Hence, taking, expectation, we get
1 N 
E (s 2 ) = ∑ E (ai ) yi2 − n {Var ( y ) + Y 2 }
(n − 1)  i =1
Substituting the values of E (ai ) and Var ( y ) in this expression and simplifying we get E ( s 2 ) = S 2 .
7
(ii) SRSWR
Let a random variable ai associated with the ith unit of the population denotes the number of times the ith unit occurs in
the sample i = 1, 2,..., N . So ai assumes values 0, 1, 2,…,n. The joint distribution of a1 , a2 ,..., aN is the multinomial
distribution given by
n! 1
P(a1 , a2 ,..., aN ) = N
.
Nn
∏a !
i =1
i
N
where ∑a
i =1
i = n. For this multinomial distribution, we have
n
E (ai ) = ,
N
n( N − 1)
Var (ai ) = , i = 1, 2,..., N .
N2
n
Cov(ai , a j ) = − , i ≠ j = 1, 2,..., N .
N2
We rewrite the sample mean as
1 N
y= ∑ ai yi .
n i =1
Hence, taking expectation of y and substituting the value of E (ai ) = n / N, we obtain that E ( y ) = Y .
8
Further,
1 N N

Var ( y ) = ∑
n 2  i =1
Var ( ai ) yi
2
+ ∑
i =1
Cov(ai , a j ) yi y j  .

Substituting, the values of Var (ai ) = n( N − 1) / N 2 and Cov( ai , a j ) = − n / N 2 and simplifying, we get
N −1 2
Var ( y ) = S .
Nn
N −1 2
To prove that E (s 2 ) = S = σ 2 in SRSWR, consider
N
n N
(n − 1) s 2 = ∑ yi2 − ny 2 = ∑ ai yi2 − ny 2 ,
i =1 i =1
N
(n − 1) E ( s 2 ) = ∑ E (ai ) yi2 − n {Var ( y ) + Y 2 }
i =1
n N
( N − 1) 2
=
N
∑y
i =1
2
i − n.
nN
S − nY 2
(n − 1)( N − 1) 2
= S
N
N −1 2
E (s2 ) = S =σ2
N
9
Estimator of population total
Sometimes, it is also of interest to estimate the population total, e.g. total household income, total expenditures etc. Let YT
denotes the population total .
N
YT = ∑ Yi = NY
i =1
which can be estimated by
YˆT = NYˆ
= Ny .
Obviously
E (YˆT ) = NE ( y )
= NY
= YT .
Var (YˆT ) = N 2 Var ( y )

 2  N − n  2 N ( N − n) 2
 N  Nn  S = S for SRSWOR
   n
=
 N 2  N − 1  S 2 = N ( N − 1) S 2 for SRSWR
  Nn  n
and the estimates of variance of YˆT are
 N ( N − n) 2
 s for SRSWOR
 ˆ n
Var (YT ) =  2
 N s2 for SRSWR.
10
 n
Confidence limits for the population mean
Now we construct the 100(1 − α )% confidence interval for the population mean.
y −Y
Assume that the population is normally distributed N ( µ , σ 2 ) with mean µ and variance σ 2 , then follows N(0,1).
. Var ( y )
when σ 2 is known.
y −Y
If σ is unknown and is estimated from the sample then
2
follows a t-distribution with (n - 1) degrees of freedom.

Var ( y )
When σ 2 is known, then the 100(1 − α )% confidence interval is given by
 y −Y 
P −Zα ≤ ≤ Zα  = 1 − α
 2 Var ( y ) 2 

 
or P  y − Z α Var ( y ) ≤ Y ≤ y + Z α Var ( y )  = 1 − α
 2 2 
and the confidence limits are
 
 y − Z α Var ( y ), y + Z α Var ( y ) 
 2 2 
α
Z
where α denotes the upper % points on N (0,1) distribution.
2
2
11
Similarly, when σ is unknown, then the 100(1 − α )% confidence interval is
2
 y −Y 
P  −tα ≤ ≤ tα  = 1 − α
 2 ( y)
Var 2 
 
  ( y ) ≤ Y ≤ y + t Var 
( y) = 1−α
or P  y − tα ≤ Var α 
 2 2 
and the confidence limits are
   
 y − tα ≤ Var ( y ) ≤ Y ≤ y + tα Var ( y ) 
 2 2 
α
where tα denotes the upper % points of t-distribution with (n - 1) degrees of freedom.
2 2
12
Sampling Theory
MODULE II
LECTURE - 6
DR. SHALABH
1
Determination of sample size
The size of the sample is needed before the survey starts and goes into operation. One point to be kept in mind is that
when the sample size increases, the variance of estimators decreases but the cost of survey increases and vice versa.
So there has to be a balance between the two aspects. The sample size can be determined on the basis of prescribed
values of standard error of sample mean, error of estimation, width of the confidence interval, coefficient of variation of
sample mean, relative error of sample mean or total cost among several others.
An important constraint or need to determine the sample size is that the information regarding the population standard
derivation S should be known for these criterion. The reason and need for this will be clear when we derive the sample
size in the next section. A question arises about how to have information about S before hand ? The possible solutions
to this issue is to conduct a pilot survey and collect a preliminary sample of small size, estimate S and use it as known
value of S. Alternatively, such information can also be collected from past data, past experience, long association of
experimenter with the experiment, prior information, etc.
Now we find the sample size under different criteria assuming that the samples have been drawn using SRSWOR. The
sample sizes under SRSWR can be derived similarly.
2
1. Pre-specified variance
The sample size is to be determined such that the variance of sample mean should not exceed a given value, say V. In
this case, find n such that
Var ( y ) ≤ V
N −n 2
or S ≤V
Nn
1 1 V
or − ≤
n N S2
1 1 1
or − ≤
n N nl
nl
or n≥
n
1+ l
N
where
S2
nl = .
V
It may be noted here that nl can be known only when S 2 is known. This reason compels to assume that S should be
known. The same reason will also be seen in other cases.
The smallest sample size needed in this case is
nl
nsmallest = .
n
1+ l
N
If N is large, then the required n is n ≥ nl and nsmallest = nl .
3
2. Pre-specified estimation error
It may be possible to have some prior knowledge of population mean Y and it may be required that the sample mean y
. absolute estimation error e, which is a small quantity. Such
should not differ from it by more than a specified amount of
requirement can be satisfied by associating a probability (1 − α ) with it and can be expressed as
P  y − Y ≤ e  = (1 − α ).
 N −n 2
Since y follows N  Y , S  assuming the normal distribution for the population, we can write
 Nn 
 y −Y e 
P ≤  = 1−α
 Var ( y ) Var ( y ) 
e
which implies that = Zα
Var ( y ) 2
or Z α2 Var ( y ) = e 2
2
N −n 2
or Z α2
S = e2
2 Nn
  Z S 2 
  α2  
   
  e  
or n =    
  Zα S  
2
 1 2  
 1+  
 N  e  
   
which is the required sample2 size.
 Zα S 
If N is large then n =  2  .
 e 
 
4
 
3. Pre-specified width of confidence interval
If the requirement is that the width of the confidence interval of y with confidence coefficient (1 − α ) should not exceed a
prespecified amount W, then the sample size n is determined such that
2 Z α Var ( y ) ≤ W
2 .
assuming σ
2
is known and population is normally distributed. This can be expressed as
N −n
2Z α S ≤W
2 Nn
1 1 
or 4 Z α2  −  S 2 ≤ W 2
2 n N
1 1 W2
or ≤ +
n N 4 Z α2 S 2
2
 4Z α S 2 
2
 2 
 W2 
 
or n≥   .
 4 Z α2 S 2 
1 2 
1+ 
N  W 2 
 
5
The minimum sample size required is
 4 Z α2 S 2 
 2 
 W2 
 
nsmallest =   .
 4Z α S 
2 2
1 2 
1+ 
N  W 2 
 
If N is large then
4 Z α2 S 2
n≥ 2
W2
and minimum sample size needed is

4 Z α2 S 2
nsmallest = 2
.
W2 6
4. Pre-specified coefficient of variation
The coefficient of variation (CV) is defined as the ratio of standard error (or standard deviation) and mean. The knowledge
of coefficient of variation has played an important role in the sampling theory as this information helps in deriving efficient
estimators of population mean.
.
If it is desired that the coefficient of variation of y should not exceed a given or prespecified value of coefficient of variation,
say C0, then the required sample size n is to be determined such that
CV ( y ) ≤ C0
Var ( y )
or ≤ C0
Y
 N −n 2
 S 
or  Nn  ≤ C2
0
Y2
2
1 1 C0
or − ≤
n N C2
C2
Co2 S
or n≥ where C = is the population coefficient of variation.
C2 Y
1+
NC02
.
The smallest sample size needed in this case is
C2
C02
nsmallest =
C2
1+
NC02
C2 C2
If N is large, then n ≥ and nsmallest = .
C02 C02
7
5. Pre-specified relative error
When y is used for estimating the population mean Y , then the relative estimation error is defined as y −Y
.
Y
If it is required that such relative estimation error should not exceed a prespecified value R with probability (1 − α ),
then such requirement can be satisfied by expressing it like

.
 y −Y RY 
P ≤  = 1−α.
 Var ( y ) Var ( y ) 
 N −n 2
Assuming the population to be normally distributed, y follows N  Y , S .
 Nn 
So it can be written that
RY
= Zα
Var ( y ) 2
 N −n 2
Z α2  S = R Y
2 2
or
2  Nn 
1 1  R2
or  − = 2 2
 n N  C Zα
2
2
 Zα C 
 2 
 R 
  S
or n=  
2
where C = is the population coefficient of variation and should be known.
 Zα C  Y
1 
1+  2 
N R 
 
2
 zα C 
If N is large, then n= 2  .
 R 
 
8
 
6. Pre-specified cost
Let an amount of money C be designated for a sample survey to collect n observations, Further, let C0 be the
overhead cost and C1 be the cost of collection of one unit in the sample. Then, the total cost C can be expressed as
C = C0 + nC1 .
C − C0
or n=
C1
is the required sample size.
9
Sampling Theory
MODULE III
LECTURE - 7
SAMPLING FOR PROPORTIONS
AND PERCENTAGES
DR. SHALABH
1
In many situations, the characteristic under study on which the observations are collected are qualitative in
nature. For example, the responses of customers in many marketing surveys are based on replies like ‘yes’ or
‘no’ , ‘agree’ or ‘disagree’ etc. Sometimes the respondents are asked to order the several options like first
choice, second choice etc. Sometimes the objective of the survey is to estimate the proportion or the
percentage of brown eyed persons, unemployed persons, graduate persons or persons favoring a proposal,
etc. In such situations, the first question arises how to do the sampling and secondly how to estimate the
population parameters like population mean, population variance, etc.
Sampling procedure:
.
The same sampling procedures that are used for drawing a sample in case of quantitative characteristics can
also be used for drawing a sample for qualitative characteristic. So, the sampling procedures remain same
irrespective of the nature of characteristic under study - either qualitative or quantitative. For example, the
SRSWOR and SRSWR procedures for drawing the samples remain the same for qualitative and quantitative
characteristics. Similarly, other sampling schemes like stratified sampling, two stage sampling etc. also remain
same.
2
Estimation of population proportion:
The population proportion in case of qualitative characteristic can be estimated in a similar way as the
estimation of population mean in case of quantitative characteristic.
Consider a qualitative characteristic based on which the population can be divided into two mutually exclusive
classes, say C and C*. For example, if C is the part of population of persons saying ‘yes’ or ‘agreeing’ with
the proposal then C* is the part of population of persons saying ‘no’ or ‘disagreeing’ with the proposal. Let A
be the number of units in C and (N - A) units be in C* in a population of size N. Then the proportion of units
in C is
A
P=
N
and
. the proportion of units in C* is
N−A
Q= = 1 − P.
N
An indicator variable Y can be associated with the characteristic under study and then for i = 1,2,..,N
1 i th unit belongs to C
Yi = 
0 i th unit belongs to C *.
N
Now the population total is YTOTAL = ∑Y
i =1
i =A
N
∑Y i
A
and population mean is Y = i =1
= = P.
N N
3
Suppose a sample of size n is drawn from a population of size N by simple random sampling . Let a be the
number of units in the sample which fall into class C and (n - a) units falls in class C*, then the sample
proportion of units in C is
a
p=
n
which can be written as
n
∑Y
a i =1 i
p= = = y.
n n
N
Since ∑ Yi 2 = A = NP, so we can write S 2 and s 2 in terms of P and Q as
i =1
1 N
S2 = ∑ (Yi − Y )2
N − 1 i =1
N
1
= (∑ Yi 2 − NY 2 )
N − 1 i =1
1
= ( NP − NP 2 )
N −1
N
= PQ.
N −1
4
Similarly,
n
∑y
i =1
2
i = a = np
and
1 n
s2 = ∑ ( yi − y )2
n − 1 i =1
n
1
= (∑ yi2 − ny 2 )
n − 1 i =1
1
= (np − np 2 )
n −1
n
= pq.
n −1
Note that the quantities y , Y , s 2 and S 2 have been expressed as functions of sample and population
proportions. Since the sample has been drawn by simple random sampling and sample proportion is same as
the sample mean, so the properties of sample proportion in SRSWOR and SRSWR can be derived using the
properties of sample mean directly.

5
1. SRSWOR
Since sample mean y an unbiased estimator of population mean Y , i.e. E ( y ) = Y in case of SRSWOR, so
E ( p) = E ( y ) = Y = P
and p is an unbiased estimator of P.
Using the expression of Var ( y ), the variance of p can be derived as
Var ( p ) = Var ( y )
N −n 2
= S
Nn
N −n N
= . PQ
Nn N − 1
N − n PQ
= . .
N −1 n
Similarly, using the estimate of Var ( y ), the estimate of variance can be derived as
 ( p ) = Var
Var ( y)
N −n 2
= s
Nn
N −n n
= pq
Nn n − 1
N −n
= pq.
N (n − 1)
6
2. SRSWR
Since sample mean y is an unbiased estimator of population mean Y in case of SRSWR so the sample
proportion,
E ( p) = E ( y ) = Y = P
i.e. p is an unbiased estimator of P.
Using the expression of Var ( y ) and its estimate in case of SRSWR, the variance of p and its estimate can be
derived as follows:
N −1 2
Var ( p ) = Var ( y ) = S
Nn
N −1 N
= PQ
Nn N − 1
PQ
=
n
 ( p ) = n . pq
Var
n −1 n
pq
= .
n −1
7
Estimation of population total or total number of count
ˆ = Np = Na
It is easy to see that an estimate of population total A (or total number of count ) is A ,
n
its variance is Var ( Aˆ ) = N 2 Var ( p )
 ( Aˆ ) = N 2 Var
and estimate of variance is Var  ( p ).
Confidence interval estimation of P

p−P
If N and n are large then approximately follows N(0,1). With this approximation, we can write
Var ( p )
 p−P 
P −Z α ≤ ≤ Zα  = 1−α
 2 Var ( p ) 2 

and the 100(1 − α )% confidence interval of P is
 
 p − Z α Var ( p ), p + Z α Var ( p )  .
 2 2 
It may be noted that in this case, a discrete random variable is being approximated by a continuous random
variable, so a continuity correction n/2 can be introduced in the confidence limits and the limits become
 n n
 p − Z α Var ( p ) + , p + Z α Var ( p ) −  .
 2
2 2
2
8
Sampling Theory
MODULE III
LECTURE - 8
SAMPLING FOR PROPORTIONS
AND PERCENTAGES
DR. SHALABH
1
Use of Hypergeometric distribution :
When SRS is applied for the sampling of a qualitative characteristic, the methodology is to draw the units one-
by-one and so the probability of selection of every unit remains same at every step. If n sampling units are
selected together from N units, then the probability of selection of units does not remain same as in the case of
SRS.
Consider a situation in which the sampling units in a population are divided into two mutually exclusive classes.
Let P and Q be the proportions of sampling units in the population belonging to classes ‘1’ and ‘2’ respectively.
Then NP and NQ are the total number of sampling units in the population belonging to class ‘1’ and ‘2’
respectively and so NP + NQ = N. The probability that in a sample of n selected units out of N units by SRS
such that n1 selected units belongs to class ‘1’ and n2 selected units belongs to class ‘2’ is governed by the
hypergeometric distribution and

 NP   NQ 
  
n n
P (n1 ) =  1   2  .
N
 
n
As N grows large, the hypergeometric distribution tends to Binomial distribution and P(n1) is approximated by
n
P( n1 ) =   p n1 (1 − p ) n2 .
 n1 
2
Inverse sampling:
In general, it is understood in the SRS methodology for qualitative characteristic that the attribute under study is
not a rare attribute. If the attribute is rare, then the procedure of estimating the population proportion P by
sample proportion n / N is not suitable. Some such situations are, e.g., estimation of frequency of rare type of
genes, proportion of some rare type of cancer cells in a biopsy, proportion of rare type of blood cells affecting
the red blood cells etc. In such cases, the methodology of inverse sampling can be used.
In the methodology of inverse sampling, the sampling is continued until a predetermined number of units
possessing the attribute under study occur in the sampling which is useful for estimating the population
proportion. The sampling units are drawn one-by-one with equal probability and without replacement. The
sampling is discontinued as soon as the number of units in the sample possessing the characteristic or attribute
equals a predetermined number.
Let m denotes the predetermined number indicating the number of units possessing the characteristic. The
sampling is continued till m number of units are obtained. Therefore, the sample size n required to attain
m becomes a random variable.

3
Probability distribution function of n :
In order to find the probability distribution function of n, consider the stage of drawing of samples t such that at t = n,
the sample size n completes the m units with attribute. Thus, the first (n - 1) draws would contain (m - 1) units in the
sample possessing the characteristic out of NP units. Equivalently, there are (n - m) units which do not possess the
characteristic out of NQ such units in the population. Note that the last draw must ensure that the units selected
possess the characteristic.
So the probability distribution function of n can be expressed as.
 In a sample of ( n -1) units   The unit drawn at 

   
P(n) = P  drawn from N , ( m -1) units  × P  the nth draw will 
 will possess the attribute   possess the attribute 
   
  NP  NQ  
  
m − 1 n − m    NP − m + 1 
=  , n = m, m + 1,..., m + NQ.
  N    N − n + 1 
   
  n − 1 
4
Note that the first term (in square brackets) is derived using hypergeometric distribution as the probability of deriving
a sample of size (n – 1) in which (m – 1) units are from NP units and (n – m) units are from NQ units. The second
NP − m + 1
term is the probability associated with last draw where it is assumed that we get the unit possessing
N − n +1
the characteristic.
m + NQ
Note that ∑
n=m
P (n) = 1.
5
Estimate of population proportion
m −1
Consider the expectation of .
n −1
m + NQ
 m −1   m −1 
E =
 n −1 
∑  n − 1  P(n)
n=m
 NP  NQ 
  
 m − 1   m − 1 n − m  Np − m + 1
m + NQ
= ∑
n=m

 n −1 

 N 
.
N − n +1
 
 n − 1
 NP − 1 NQ 
  
 NP − m + 1   m − 2  n − m 
m + NQ −1
= ∑
n=m

 N − n +1 

 N − 1
 
 n−2
which is obtained by replacing NP by NP – 1, m by (m – 1) and n by (n - 1) in the earlier step.
Thus
 m −1 
E  = P.
 n −1 
So
m −1
Pˆ =
n −1
is an unbiased estimator of P.
6
Estimate of variance of P̂ :
Now we derive an estimate of variance of P̂ . By definition
2
Var ( Pˆ ) = E ( Pˆ 2 ) −  E ( Pˆ ) 
=E ( Pˆ 2 ) − P 2 .
Thus
 ( Pˆ ) = Pˆ 2 − Estimate of P 2
Var
(m − 1)(m − 2)
In order to obtain an estimate of P2, consider the expectation of , i.e.,
(n − 1)(n − 2)
 (m − 1)(m − 2)   (m − 1)(m − 2) 
E  = ∑ P ( n )
 (n − 1)(n − 2)  n≥ m  (n − 1)(n − 2) 
  NP − 2   NQ  
  
P( NP − 1)  NP − m + 1    m − 3   n − m  
= ∑ 
N − 1 n≥m  N − n + 1    N − 2 
   
  n−3  
where the last term inside the square bracket is obtained by replacing NP by (NP-2), N by (n-2) and m by (m - 2) in
the probability distribution function of hypergeometric distribution. This solves further to
 (m − 1)(m − 2)  NP 2 P
E  = − .
 (n − 1)(n − 2)  N − 1 N − 1
7
Thus an unbiased estimate of P2 is
 N − 1  (m − 1)(m − 2) Pˆ
Estimate of P 2 =   +
 N  (n − 1)(n − 2) N
 N − 1  (m − 1)(m − 2) 1 m − 1
=  + . .
 N  (n − 1)(n − 2) N n − 1
Finally, an estimate of variance of P̂ is
 ( Pˆ ) = Pˆ 2 − Estimate of P 2
Var
2
 m − 1   N − 1 (m − 1)(m − 2) 1  m − 1  
=  − . +  
 n − 1   N (n − 1)(n − 2) N  n − 1  
 m − 1   m − 1  1  ( N − 1)(m − 2)  
=    + 1 −  .
 n − 1   n − 1  N  n−2 
For large N, the hypergeometric distribution tends to negative Binomial distribution with probability density function
 n − 1  m n−m
 P Q .
 m − 1
So
m −1
Pˆ =
n −1
and
ˆ ˆ
 ( Pˆ ) = (m − 1)(n − m) = P(1 − P) .
Var
(n − 1) (n − 2)
2
n−2
8
Estimation of proportion for more than two classes
We have assumed up to now that there are only two classes in which the population can be divided based on a
qualitative characteristic . There can be situations when the population is to be divided into more than two classes.
For example, the taste of a coffee can be divided into four categories very strong, strong, mild and very mild. Similarly
in another example the damage to crop due to storm can be classified into categories like heavily damaged, damaged,
minor damage and no damage etc.
These type of situations can be represented by dividing the population of size N into, say k, mutually exclusive
C1 C C
classes C1 , C2 ,..., Ck . Corresponding to these classes, let P1 = , P2 = 2 ,..., Pk = k , be the proportions of units in
N N n
the classes C1 , C2 ,..., Ck respectively.
Let a sample of size n is observed such that c1 , c2 ,..., ck number of units have been drawn from C1 , C2 ,..., Ck .
respectively. Then the probability of observing c1 , c2 ,..., ck is
 C1   C2   Ck 
    ...  
c c c
P (c1 , c2 ,..., ck ) =  1   2   k  .
N
 
n
9
ci
The population proportions Pi can be estimated by pi = , i = 1, 2,..., k .
n
It can be easily shown that
E ( pi ) = Pi , i = 1, 2,..., k ,
N − n PQ  ( p ) = N − n pi qi .
Var ( pi ) = i i
and Var i
N −1 n N n −1
For estimating the number of units in the ith class,
Cˆ i = Npi
 (Cˆ ) = N 2 Var
Var (Cˆ i ) = N 2Var ( pi ) and Var  ( p ).
i i
The confidence intervals can be obtained based on single pi as in the case of two classes.
If N is large, then the probability of observing c1 , c2 ,..., ck can be approximated by multinomial distribution given by
n!
P(c1 , c2 ,..., ck ) = P1c1 P2c2 ...Pkck .
c1 !c2 !...ck !
For this distribution
E ( pi ) = Pi , i = 1, 2,.., k ,
Pi (1 − Pi )  ( pˆ ) = pi (1 − pi ) .
Var ( pi ) = and Var i
n n
10
Sampling Theory
MODULE IV
LECTURE - 9
STRATIFIED SAMPLING
DR. SHALABH
1
An important objective in any estimation problem is to obtain an estimator of a population parameter which can take
care of all salient features of the population. If the population is homogeneous with respect to the characteristic under
study, then the method of simple random sampling will yield a homogeneous sample and in turn, the sample mean will
serve as a good estimator of population mean. Thus, if the population is homogeneous with respect to the characteristic
under study, then the sample drawn through simple random sampling is expected to provide a representative sample.
Moreover, the variance of sample mean not only depends on the sample size and sampling fraction but also on the
population variance. In order to increase the precision of an estimator, we have to use a sampling scheme which
reduces the heterogeneity in the population. If the population is heterogeneous with respect to the characteristic under
study, then one such sampling procedure is stratified sampling.
The basic idea behind the stratified sampling is to
• divide the whole heterogeneous population into smaller groups or subpopulations, such that the sampling
units are homogeneous with respect to the characteristic under study within the subpopulation and
heterogeneous with respect to the characteristic under study between/among the subpopulations. Such
subpopulations are termed as strata.
• Treat each subpopulation as separate population and draw a sample by SRS from each stratum.
[Note: ‘Stratum’ is singular and ‘strata’ is plural].

2
Example:
In order to find the average height of students in a school of class 1 to class 12, the height varies a lot as the
students in class 1 are of age around 6 years and students in class 10 are of age around 16 years. So one can
divide all the students into different subpopulations or strata such as
Students of class 1, 2 and 3: Stratum 1
Now draw the samples by SRS from each of the strata 1, 2 ,3 and 4. All the drawn samples combined together will
constitute the final stratified sample for further analysis.
Notations:
We use the following symbols and notations:

N : Population size
K : Number of strata
Ni : Number of sampling units in ith strata
k
N = ∑ Ni
i =1
ni : Numbers of sampling units to be drawn from ith stratum.

k
n = ∑ ni : Total sample size
3
i =1
Procedure of stratified sampling
Divide the population of N units into k strata. Let the ith stratum have N1 , i = 1, 2,..., k number of units.
• Strata are constructed such that they are non-overlapping and homogeneous with respect to the characteristic
under study such that

k
∑N
i =1
i = N.
• Draw a sample of size ni from ith ( i = 1, 2,..., k ) stratum using SRS (preferably WOR) independently from each
stratum.
k
• All the sampling units drawn from each stratum will constitute a stratified sample of size n = ∑n .
i =1
i
Difference between stratified and cluster sampling schemes

In stratified sampling, the strata are constructed such that they are
• within homogeneous and
• among heterogeneous.
In cluster sampling, the clusters are constructed such that they are
• within heterogeneous and
• among homogeneous.
[Note: We consider cluster sampling later.]

4
Issue in estimation in stratified sampling
Divide the population of N units into k strata. Let the ith stratum have N1 , i = 1, 2,..., k number of units.
Note that there are k independent samples drawn through SRS of sizes n1 , n2 ,..., nk . So, one can have k
estimators of a parameter based on sizes n1 , n2 ,..., nk . Our interest is not to have k different estimators of the
parameters but ultimate goal is to have a single estimator. In this case, an important issue is how to combine the
different sample information together into one estimator which is good enough to provide the information about the
parameter.
We now consider the estimation of population mean and population variance from a stratified sample.
Estimation of population mean and its variance

Let
Y : characteristic under study,
yij : value of jth unit in ith stratum j = 1,2,…,ni, i = 1,2,...,k,
Ni
1
Yi =
Ni
∑y :
j =1
ij population mean of ith stratum
ni
1
yi =
ni
∑y
j =1
ij : sample mean of units from ith stratum j = 1,2,…,n , i = 1,2,...,k,
1 k k Ni
Y=
N
∑ N Y = ∑ wY :
i =1
i i
i =1
i i
population mean where wi =
N
.
5
Population (N units)
Stratum 1 Stratum 2 Stratum k

k
N1 units N2 units … … … Nk units N = ∑ Ni
i =1
Sample 1 Sample 2 Sample k k
… … … n = ∑ ni
n1 units n2 units nk units i =1
6
Estimation of population mean
First we discuss the estimation of population mean.
Note that the population mean is defined as the weighted arithmetic mean of stratum means in case of stratified
sampling where the weights are provided in terms of strata sizes.

k
1
Based on the expression Y =
N
∑ N Y , one may choose the sample mean
i =1
i i
k
1
y= ∑ ni yi
n i =1
as a possible estimator of Y .
Since the sample in each stratum is drawn by SRS, so
E ( yi ) = Yi ,
thus
1 k
E( y ) = ∑ ni E ( yi )
n i =1
1 k
= ∑ ni Yi
n i =1
1 k
≠ ∑ ni Y i
n i =1
≠Y
7
and y turns out to be a biased estimator of Y . Based on this, one can modify y so as to obtain an unbiased
estimator of Y . Consider the stratum mean which is defined as the weighted arithmetic mean of strata sample
means with strata sizes as weights given by

k
1
y st =
N
∑N y.
i =1
i i
Now
k
1
E ( yst ) =
N
∑ N E( y )
i =1
i i
k
1
=
N
∑N Yi =1
i i
= Y.
Thus yst is an unbiased estimator of Y .
8
Variance of yst :
k k ni
Var ( yst ) = ∑ wi2 Var ( yi ) + ∑ ∑w w i j Cov( yi , y j )
i =1 i ( ≠ j ) =1 j =1
Since all the samples have been drawn independently from each strata by SRSWOR so
Cov( yi , y j ) = 0
N i − ni 2
Var ( yi ) = Si
N i ni
where
1 Ni
Si2 = ∑ (Yij − Y i )2 .
N i − 1 j =1
Thus
k
N i − ni 2
Var ( yst ) = ∑ wi2 Si
i =1 N i ni
k
 n  S2
= ∑ wi2 1 − i  i .
i =1  N i  ni
2
Observe that Var ( yst ) is small when Si2 is small. This observation suggests how to construct the strata . If Si is
small for all i = 1,2,...,k, then Var ( yst ) will also be small. That is why it was mentioned earlier that the strata are to
be constructed such that they are within homogeneous, i.e., Si2 is small and among heterogeneous.
For example, the units in geographical proximity will tend to be closer. The consumption pattern in households will
be similar within a lower income group housing society and within a higher income group housing society whereas
they will differ a lot between the two housing societies based on income.
9
Estimate of variance
Since the samples have been drawn by SRSWOR, so
E ( si2 ) = Si2
1 ni
where si2 = ∑ ( yij − yi )2
ni − 1 j =1
 ( y ) = N i − ni s 2
and Var i i
N i ni
k
so Var st ∑ i  ( yi )
 ( y ) = w2 Var
j =1
k
 N −n 
= ∑ wi2  i i  si2 .
i =1  N i ni 
Note: If SRSWR is used instead of SRSWOR for drawing the samples from each stratum, then in this case
k
yst = ∑ wi yi
i =1
E ( y st ) = Y
k
 N − 1  2 k 2 σ i2
Var ( yst ) = ∑ wi2  i  Si = ∑ wi
i =1  N i ni  i =1 ni
k 2 2
wi si
( y ) =
Var st ∑
i =1 ni
Ni
1
where σ i2 = ∑( y − yi ) 2 .
10
ij
Ni j =1
Advantages of stratified sampling
1. Data of known precision may be required for certain parts of the population. This can be accomplished with a
more careful investigation to few strata.
Example: In order to know the direct impact of hike in petrol prices, the population can be divided into strata
like lower income group, middle income group and higher income group. Obviously, the higher income group is
more affected than the lower income group. So more careful investigation can be made in the higher income
group strata.
2. Sampling problems may differ in different parts of the population.
Example: To study the consumption pattern of households, the people living in houses, hotels, hospitals,
prison etc. are to be treated differently.
3. Administrative convenience can be exercised in stratified sampling.
Example: In taking a sample of villages from a big state, it is more administratively convenient to consider the
districts as strata so that the administrative setup at district level may be used for this purpose. Such
administrative convenience and convenience in organization of field work are important aspects in national level
surveys.
11
4. Full cross-section of population can be obtained through stratified sampling. It may be possible in SRS that
some large part of the population may remain unrepresented. Stratified sampling enables one to draw a
sample representing different segments of the population to any desired extent. The desired degree of
representation of some specified parts of population is also possible.
5. Substantial gain in efficiency is achieved if strata are formed intelligently.
6. In case of skewed population, use of stratification is of importance since larger weight may have to be given for
the few extremely large units which in turn reduces the sampling variability.
7. When estimates are required not only for the population but also for the subpopulations, then stratified
sampling is helpful.
8. When the sampling frame for subpopulations is more easily available than the sampling frame for whole
population.
9. If population is large, then it is convenient to sample separately from the strata rather than the entire
population.
10. The population mean or population total can be estimated with higher precision by suitably providing the
weights to the estimates obtained from each stratum.

12
Sampling Theory
MODULE IV
LECTURE - 10
STRATIFIED SAMPLING
DR. SHALABH
1
Allocation problem and choice of sample sizes in different strata
Question: How to choose the sample sizes n1 , n2 ,..., nk so that the available resources are used in an effective
way ?
There are two aspects of choosing the sample sizes:
i. Minimize the cost of survey for a specified precision.
ii. Maximize the precision for a given cost.
Note: The sample size cannot be determined by minimizing the cost and variability simultaneously. The cost
function is directly proportional to the sample size whereas variability is inversely proportional to the sample size.
Based on different ideas, some allocation procedures are as follows:
1. Equal Allocation
Choose the sample size ni to be same for all the strata.
Draw samples of equal size from each strata.
Let n be the sample size and k be the number of strata, then

n
ni = for all i = 1, 2,..., k .
k
2
2. Proportional Allocation
For fixed k, select ni such that it is proportional to stratum size Ni , i.e.,
ni ∝ N i
or ni = CN i
where C is the constant of proportionality.
k k
∑ n = ∑ CN
i =1
i
i =1
i
or n = CN
n
⇒C = .
N
n
Thus ni =   N i .
N
Such allocation arises from the considerations like operational convenience.
3
3. Neyman or optimum allocation
This allocation considers the size of strata as well as variability
ni ∝ N i Si
ni = C * N i Si
where C* is the constant of proportionality.
k k
∑ n = ∑C N S
i =1
i
i =1
*
i i
k
or n = C * ∑ N i Si
i =1
n
or C * = k
.
∑N S
i =1
i i
nN i Si
Thus ni = k
.
∑N S
i =1
i i
k
This allocation arises when the Var ( yst ) is minimized subject to the constraint ∑n
i =1
i (prespecified).
There are some limitations of optimum allocation. The knowledge of Si (i = 1, 2,..., k ) is needed to know ni.
If there are more than one characteristics, then they may lead to conflicting allocation.
4
Choice of sample size based on cost of survey and variability
The cost of survey depends upon the nature of survey. A simple choice of cost function is
k
C = C0 + ∑ Ci ni
i =1
where
C : total cost
C0 : overhead cost, e.g., setting up of office, training people etc.
Ci : cost per unit in the ith stratum

k
∑C n :
i =1
i i total cost within sample.
To find ni under this cost function, consider the Lagrangian function with Lagrangian multiplier λ as
φ = Var ( yst ) + λ 2 (C − C0 )
k
1 1  k
= ∑ wi2  −  Si2 + λ 2 ∑ Ci ni
i =1  ni N i  i =1
2 2
k
wS k k
w2 S 2
= ∑ i i + λ 2 ∑ Ci ni − ∑ i i
i =1 ni i =1 i =1 Ni
2
k w S 
= ∑  i i − λ Ci ni  + terms independent of ni .
 ni
i =1  
5
Thus φ is minimum when
wi Si
= λ Ci ni for all i
ni
1 wi Si
or ni = .
λ Ci
How to determine λ ?
There are two ways to determine λ

i. Minimize variability for fixed cost .
ii. Minimize cost for given variability.
We consider both the cases.
6
i. Minimize variability for fixed cost:
Let C = C0* be fixed.
So k
∑C n
i =1
i i = C0*
k
wi Si
or ∑C
i =1
i
λ Ci
= C0*
∑ Ci wi Si
or λ = i =1
.
C0*
1 wi Si
Substituting λ in the expression for ni = , the optimum ni is obtained as
λ Ci
 
w S  C0* 
ni* = i i  k .
Ci  
 ∑ Ci wi Si 
 i =1 
The required sample size to estimate Y such that the variance is minimum for given cost C = C0* is
k
n = ∑ ni* .
i =1
7
ii. Minimize cost for given variability:
Let V = V0 be prespecified variance. Now determine ni such that
k
1 1  2 2
∑ n −
Ni
 wi Si = V0
i =1  i 
k
wi2 Si2 k
wi2 Si2
or ∑
i =1 ni
= V0 + ∑
i =1 Ni
k
λ Ci 2 2 k
wi2 Si2
or ∑
i =1 wi Si
wi Si = V0 + ∑
i =1 Ni
k
wi2 Si2
V0 + ∑
Ni 1 wi Si
or λ = i =1
(after substituting ni = ).
k
λ Ci
∑w S
i =1
i i Ci
Thus the optimum ni is
 k 
wi Si  ∑
wi Si Ci 
ni =  i =1 .
Ci 
k
wi2 Si2 
 0 ∑ N
V + 
 i =1 i 
k
So the required sample size to estimate Y such that cost C is minimum for a prespecified variance V0 is n = ∑ n .
i =1
i
8
Sample size under proportional allocation for fixed cost and for fixed variance
k
(i) If cost C = C0 is fixed then C0 = ∑C n .
i =1
i i
n
Under proportional allocation, ni = N i = nwi .
N
k
So C0 = n ∑ wi Ci
i =1
C0
or n= k
.
∑wC i =1
i i
Co wi
Thus ni = .
∑ wiCi
k
The required sample size to estimate Y in this case is n = ∑ ni .
i =1
9
Sampling Theory
MODULE IV
LECTURE - 11
STRATIFIED SAMPLING
DR. SHALABH
1
Variances under different allocation:
Now we derive the variance of yst under proportional and optimum allocations.
(i) Proportional allocation
Under proportional allocation,

n
ni = Ni
N
and
k
 N −n 
Var ( y ) st = ∑  i i wi2 Si2
i =1  N i ni 
 n 
 Ni − N Ni   Ni  2
k 2
Varprop ( y ) st = ∑     Si
n
i =1  N
i Ni   N 
 N 
N − n k N i Si2
= ∑
Nn i =1 N
N −n k
= ∑
Nn i =1
wi Si2 .
2
ii. Optimum allocation
Under optimum allocation,
nN i Si
ni = k
∑N S
i =1
i i
k
1 1  2 2
Vopt ( yst ) = ∑  −  wi Si
i =1  ni Ni 
k
wi2 Si2 k wi2 Si2
=∑ −∑
i =1 ni i =1 Ni
  k 
k   ∑ N i Si   k w2 S 2
= ∑  wi2 Si2  i =1   −∑ i i
i =1   nN i Si   i =1 N i
  
  
k
1 N S  k   k w2 S 2
= ∑  . i 2 i  ∑ N i Si   − ∑ i i
i =1  n N  i =1   i =1 N i
2
1 k N S  k
w2 S 2
= ∑ i i  −∑ i i
n  i =1 N  i =1 N i
2
1 k  1 k
=  ∑ wi Si  − ∑wS i i
2
.
n  i =1  N i =1
3
Comparison of variances of sample mean under SRS with stratified mean under
proportional and optimal allocation:
(a.) Proportional allocation

N −n 2
VSRS ( y ) = S
Nn
N − n k N i Si2
V p r op ( yst ) = ∑ .
Nn i =1 N
In order to compare VSRS ( y ) and V prop ( yst ), first we attempt to express S 2 as a function of Si2 .
Consider
k Ni
( N − 1) S 2 = ∑ ∑ (Yij − Y ) 2
i =1 j =1
k Ni 2
= ∑ ∑ (Yij − Yi ) + (Yi − Y ) 
i =1 j =1
k Ni k Ni
= ∑ ∑ (Yij − Yi ) 2 + ∑ ∑ (Yi − Y ) 2
i =1 j =1 i =1 j =1
k k
= ∑ ( N i − 1) Si2 + ∑ N (Y − Y ) i i
2
i =1 i =1
N − 1 2 k Ni − 1 2 k
Ni
S =∑ Si + ∑ (Yi − Y ) 2 .
N i =1 N i =1 N
4
For simplification, we assume that Ni is large enough to permit the approximation
Ni − 1 N −1
≈ 1 and ≈ 1.
Ni N
Thus
k
Ni 2 k Ni
S2 = ∑ Si + ∑ (Yi − Y ) 2
i =1 N i =1 N
N −n 2 N −n k
Ni 2 N − n k
Ni N -n
or
Nn
S =
Nn
∑i =1 N
Si +
Nn
∑ i =1 N
(Yi − Y ) 2 (Premultiply by
Nn
on both sides)
N −n k
VarSRS (Y ) = V prop ( y st ) +
Nn
∑ w (Y − Y )
i =1
i i
2
k
Since ∑ w (Y − Y )
i =1
i i
2
≥ 0,
⇒ Varprop ( yst ) ≤ VarSRS ( y ).
Larger gain in the difference is achieved when Yi differs from Y more.
5
(b) Optimum allocation
2
1 k  1 k
Vopt ( yst ) = ∑ wi Si  − N
n  i =1
∑wS
i =1
i i
2
.
Consider
 N − n  k 2
1  k 2
 1 k 
V prop ( yst ) − Vopt ( yst ) =   ∑
 Nn  i =1
wi S i  −   ∑ wi Si  −
  n  i =1  N
∑wS i i
2

i =1 
1 k  
2
 k
=  ∑ wi Si −  ∑ wi Si  
2
n  i =1  i =1  
k
1 1
= ∑ wi Si2 − S 2
n i =1 n
1 k
= ∑ wi (Si − S )2
n i =1
where
k
S = ∑ wi Si
i =1
⇒ Varprop ( yst ) − Varopt ( yst ) ≥ 0

or Varopt ( yst ) ≤ Varprop ( yst ).
Larger gain in efficiency is achieved when Si differ from S more.
Combining the results in (a) and (b), we have
Varopt ( yst ) ≤ Varprop ( yst ) ≤ VarSRS ( y ).

6
Estimate of variance and confidence intervals
2
Under SRSWOR, an unbiased estimate of Si for the ith stratum (i = 1,2,...,k) is
1 ni
si2 = ∑
ni − 1 j =1
( yij − yi ) 2 .
In stratified sampling,
k
N i − ni 2
Var ( yst ) = ∑ wi2 Si .
i =1 N i ni
So an unbiased estimate of Var ( yst ) is
 ( y ) = w2 N i − ni s 2
k
Var st ∑
i =1
i
N i ni
i
wi2 si2 k wi2 si2

k
= ∑
i =1 ni
−∑
i =1 N i
wi2 si2 1
k k
= ∑
i =1 ni
−
N
∑ws
i =1
2
i i .
The second term represents the reduction due to finite population correction.
The confidence limits of Y can be obtained as
(y )
yst ± t Var st
assuming y st is normally distributed and  ( y ) is well determined so that t can be read from normal distribution
Var st
tables. If only few degrees of freedom are provided by each stratum, then t values are obtained from the table of
student’s t-distribution.
7
The distribution of  ( y ) is generally complex. An approximate method of assigning an effective
Var st
number of degrees of freedom (ne ) to  ( y ) is

Var st
2
 k 2
 ∑ gi si 
ne =  i =k1 2 4
gi si
∑
i =1 ni − 1
N i ( N i − ni )
where gi =
ni
k
and Min(ni − 1) ≤ ne ≤ ∑ (n − 1)
i =1
i
assuming yij are normally distributed.
8
Modification of optimal allocation
Sometimes in optimal allocation, the size of subsample exceeds the stratum size. In such a case,
• replace ni by Ni and
• recompute the rest of ni’s by the revised allocation.
For example, if ni > N1 then take the revised ni’s as
n1 = N1
and
( n − N1 ) wi Si
ni = k
; i = 2,3,..., k
∑ wi Si
i =2
provided ni ≤ N i for all i = 2,3,…,k.

Suppose in revised allocation, we find that n2 > N i then the revised allocation would be
n1 = N1
n2 = N 2
(n − N1 − N 2 ) wi Si
ni = k
; i = 3, 4,..., k
∑wS
i =3
i i
provided ni < N i for all i = 3, 4,..., k .

We continue this process until every ni < N i .
In such cases, the formula for minimum variance of yst need to be modified as
(∑ wi Si ) 2 ∑
* *
wi Si2
Min Var ( y st ) = −
n* N
∑
*
where denotes the summation over the strata in which ni ≤ N i and n* is the revised total sample size
9
in the strata.
Sampling Theory
MODULE IV
LECTURE - 12
STRATIFIED SAMPLING
DR. SHALABH
1
Stratified sampling for proportions
If the characteristic under study is qualitative in nature, then its values will fall into one of the two mutually
exclusive complementary classes C and C’ . Ideally, only two strata are needed in which all the units can
be divided depending on whether they belong to C or its complement C’. This is difficult to achieve in
practice. So the strata are constructed such that the proportion in C varies as much as possible among
strata.
Let
Ai
Pi = : Proportion of units in C in ith stratum
Ni
ai
pi = : Proportion of units in C in the sample drawn from ith stratum.
ni
An estimate of population proportion based on stratified sampling is

k
N i pi
pst = ∑
i =1 N
which is based on the indicator variable
1 when j th unit belonging to i th stratum is in C

Yij = 
0 otherwise
and yst = pst .
Ni
Here Si2 = Pi Qi
Ni − 1
where Qi = 1 − Pi .
2
k
N i − ni 2 2
Also Var ( yst ) = ∑ wi Si .
i =1 N i ni
1 k
N i2 ( N i − ni ) PQ
so Var ( pst ) =
N2
∑
i =1 N − 1
i i
n
.
i i
If the finite population correction can be ignored, then

k
PQ
Var ( pst ) = ∑ wi2 i i
.
i =1 ni
If proportional allocation is used for ni , then variance of pst is
N − n 1 k N i2 PQ
Varprop ( pst ) = ∑ i i
N Nn i =1 N i − 1
N −n k
= ∑ wi PQ
Nn i =1
i i
and its estimate is
 prop ( p ) = N − n w pi qi .
k
Var st ∑ i
Nn i =1 ni − 1
The best choice of ni such that it minimizes the variance for fixed total sample size is
N i PQ
ni ∝ N i i i
Ni − 1
= N i PQ
i i.
3
Thus
N i PQ
ni = n k
i i
.
∑ Ni PQ
i =1
i i
k
Similarly, the best choice of ni such that the variance is minimum for fixed cost C = C0 + ∑C n
i =1
i i
is
PQ
nN i i i
Ci
ni = k
.
PQ
∑N
i =1
i
i i
Ci
4
Estimation of the gain in precision due to stratification
An obvious question crops up that what is the advantage of stratifying a population in the sense that
instead of using SRS, the population is divided into various strata? This is answered by estimating the
variance of estimators of population mean under SRS (without stratification) and stratified sampling by
evaluating
 SRS ( y ) − Var
Var (y )
st
.

Var ( y ) st
Since
N −n 2
VarSRS ( y ) = S .
Nn
2
How to estimate S based on a stratified sample?
Consider
k Ni
( N − 1) S 2 = ∑∑ (Yij − Y ) 2
i =1 j =1
k Ni 2
= ∑∑ (Yij − Yi ) + (Yi − Y ) 
i =1 j =1
k Ni k
= ∑∑ (Yij − Y ) 2 + ∑ N i (Yi − Y ) 2
i =1 j =1 i =1
k k
= ∑ ( N i − 1) Si2 + ∑ N i (Yi − Y ) 2
i =1 i =1
k
 k 
= ∑ ( N i − 1) Si2 + N  ∑ wiYi 2 − Y 2 .
i =1  i =1 
5
In order to estimate S 2, we need the estimates of Si2 , Yi 2 and Y 2 . We consider their estimation one by one.
(I) For estimate of Si2 , we have
E ( si2 ) = Si2 .
So Sî2 = si2 .
(II) For estimate of Yi 2 , we know
Var ( yi ) = E ( yi 2 ) − [ E ( yi )]2
= E ( yi 2 ) − Yi 2
⇒ Yi 2 = E ( yi 2 ) − Var ( yi ).
(III) For estimate of Y 2 , we know
Var ( yst ) = E ( yst2 ) − [ E ( yst )]2

= E ( yst2 ) − Y 2
⇒ Y 2 = E ( yst2 ) − Var ( yst ) .
So an estimate of Y 2 is
Yˆ 2 = yst2 − Var
(y )
st
k
 N −n  2 2
= yst2 − ∑  i i wi si .
i =1  N i ni 
Substituting these estimates in the expression (n − 1) S 2 as follows, the estimate of S 2 is obtained as

6
k
 k 
( N − 1) S 2 = ∑ ( N i − 1) Si2 + N  ∑ wi Yi 2 − Y 2 
i =1  i =1 
1 k N  k 
as Sˆ 2 = ∑ ( N i − 1) Sî2 +  ∑ w iYî 2 − Yˆ 2 
N − 1 i =1 N − 1  i =1 
1  k  N  k   N −n  2    2 k N i − ni 2 2  
=  ∑
N − 1  i =1
( Ni − 1) si2  +  ∑ wi  yi2 −  i i
 N − 1  i =1 
 si   −  yst − ∑ wi si  
 N i ni    i =1 N i ni  
1  k 2 N  k k
N −n 
=  ∑
N − 1  i =1
( N i − 1) si  + ∑ i i
 N − 1  i =1
w ( y − y st ) 2
− ∑ wi (1 − wi ) i i si2  .
N i ni
i =1 
Thus
 SRS ( y ) = N − n Sˆ 2
Var
Nn
N −n  k  N ( N − n)  k k
N i − ni 2 
=  ∑
N ( N − 1)n  i =1
( N i − 1) si2  +  ∑ wi ( yi − yst ) −∑ wi (1 − wi )
 nN ( N − 1)  i =1
2
N i ni
si 
i =1 
and
N i − ni 2 2
k
( y ) =
Var st ∑
i =1 N i ni
wi si .
 SRS ( y ) − Var
Var (y )
Substituting these expressions in st
,

Var ( y ) st
the gain in efficiency due to stratification can be obtained.

If any other particular allocation is used, then substituting appropriate ni under that allocation, such gain can be
estimated.
7
Interpenetrating subsampling
Suppose a sample consists of two or more subsamples which are drawn according to the same sampling
scheme. The samples are such that each subsample yields an estimate of parameter. Such subsamples are
called interpenetrating subsamples.
The subsamples need not necessarily be independent. The assumption of independent subsamples helps in
obtaining an unbiased estimate of the variance of the composite estimator. This is even helpful if the sample
design is complicated and the expression for variance of the composite estimator is complex.
Let there be g independent interpenetrating subsamples and t1 , t2 ,..., t g be g unbiased estimators of

parameter θ where t j ( j = 1, 2,..., g ) is based on jth interpenetrating subsample.
Then an unbiased estimator of θ is given by
1 g
θˆ = ∑ t j = t , say.
g j =1
Then E (θˆ) = E ( t ) = θ
g
1
 (θˆ) = Var
and Var  (t ) =
∑
g ( g − 1) j =1
(t j − t ) 2 .
8
Note that
1  g 
 (t ) =
E Var E  ∑ (t j − θ ) 2 − g ( t − θ ) 2 
  g ( g − 1)
 j =1 
1  g 
= E  ∑ Var (t j ) − g Var ( t ) 
g ( g − 1)  j =1 
1
= ( g 2 − g )Var ( t )
g ( g − 1)
= Var ( t ).
If distribution of each estimator tj is symmetric about θ , then the confidence interval of θ can be
obtained by
g −1
1
P  Min(t1 , t2 ,..., t g ) < θ < Max(t1 , t2 ,..., t g )  = 1 −   .
2
9
Sampling Theory
MODULE IV
LECTURE - 13
STRATIFIED SAMPLING
DR. SHALABH
1
Implementation of interpenetrating subsamples in stratified sampling
Consider the set up of stratified sampling. Suppose that each stratum provides an independent
interpenetrating subsample. So based on each stratum, there are L independent interpenetrating subsamples
drawn according to same sampling scheme.
Let Yîj (tot ) be the unbiased estimator of total of jth stratum based on the ith subsample, i = 1,2,...,L; j = 1,2,...,k.
An unbiased estimator of the jth stratum total is given by
1 J
Yˆj (tot ) = ∑ Yîj (tot )
L i =1
and an unbiased estimator of the variance of Yîj (tot ) is given by

L
1
 (Yˆ ) =
Var j ( tot ) ∑
L( L − 1) i =1
(Yîj ( tot ) − Yˆj ( tot ) ) 2 .
Thus an unbiased estimator of population total Ytot is

k
1 L k
Yˆtot = ∑ Yˆj (tot ) = ∑∑ Yîj (tot )
j =1 k i =1 j =1
and unbiased estimator of its variance is

k L k
1
Var tot ∑  (Yˆj (tot ) ) =
 (Yˆ ) = Var ∑∑
L( L − 1) i =1 j =1
(Yîj ( tot ) − Yˆj ( tot ) ) 2 .
j =1
2
Post Stratifications
Sometimes the stratum to which a unit belongs may be known after the field survey only. For example, the
age of persons, their educational qualifications etc. can not be known in advance. In such cases, we adopt the
post stratification procedure to increase the precision of the estimates.
In post stratification,
 draw a sample by simple random sampling from the population and carry out the survey.
 After the completion of survey, stratify the sampling units to increase the precision of the estimates.
Assume that the stratum size Ni is fairly accurately known. Let

mi : number of sampling units from ith stratum, i = 1,2,...,k.
∑m
i =1
i = n.
Note that mi is a random variable (and that is why we are not using the symbol ni as earlier).
Assume n is large enough or the stratification is such that the probability that some mi =0 is negligibly
small. In case, mi =0 for some strata, two or more strata can be combined to make the sample size non-
zero before evaluating the final estimates.
A post stratified estimator of population mean Y is

k
1
y post =
N
∑N y .
i =1
i i
3
Now
1  k 
E ( y post ) = E ∑ N i E ( yi / m1 , m2 ,..., mk ) 
N  i =1 
1  k 
= E  ∑ N i yi 
N  i =1 
=Y
Var ( y post ) = E Var ( y post | m1 , m2 ,..., mk )  + Var  E ( y post | m1 , m2 ,..., mk ) 
 k  1 1  2
= E  ∑ wi2  −  Si  + Var (Y )
 i =1  mi N i  
k   1   1 
= ∑ wi2  E   −    Si2 (Var (Y ) = 0).
i =1   mi   N i  
 1  1
To find E   − , proceed as follows:
 mi  N i
Consider the estimate of ratio based on ratio method of estimation as

n N
y
∑y j
Y
∑Y j
Rˆ = = j =1
n
, R= = j =1
N
.
x X
∑x
j =1
j ∑X
j =1
j 4
We know that
N − n RS X2 − S XY
E ( Rˆ ) − R = . .
Nn X2
Let
1 if j th unit belongs to i th stratum
xj = 
0 otherwise
and
y j = 1 for all j = 1,2,...,N.
Then R, Rˆ and S x reduces to

2
∑y j
n
Rˆ = j =1
n
=
ni
∑xj =1
j
∑yj =1
j
N
R= N
=
Ni
∑xj =1
j
1 N 2 2 1  N i2  1  Ni 2 
S x2 =  ∑ x j − Nx  =  Ni − N 2  =  Ni − 
N − 1  j =1  N −1  N  N −1  N 
1 N  1  N N
S xy =  ∑ x j y j − Nx y  =  N i − i  = 0.
N − 1  j =1  N −1  N 
5
Using these values in E ( Rˆ ) − R, we have
 n  N N ( N − n)( N − N i )
E ( Rˆ ) − R = E   − = .
 ni  N i nN i2 ( N − 1)
Thus
1 1 N N ( N − n)( N − N i ) 1
E − = + −
 ni  N i nN i n 2 N i2 ( N − 1) Ni
( N − n) N  N 1
= 1 + − .
n( N − 1) N i  N i n n 
Replacing mi in place of ni , we obtain
 1  1 ( N − n) N  N 1
E − = 1 + − .
 mi  N i n( N − 1) N i  N i n n 
Now substitute this in the expression of Var ( y post ) as
6
k   1  1  2
Var ( y post ) = ∑ wi2  E   −  Si
i =1   mi  Ni 
k  N −n N  N 1 
= ∑ wi2 Si2  . 1 + − 
i =1  ( N − 1)n N i  nN i n  
N −n k 2 2  1  1 1 
= ∑ wi Si  w
n( N − 1) i =1
1 + − 
 i  nwi n  
N −n k  1
= ∑
n ( N − 1) i =1
2
wi Si2  n − 1 + 
wi 

N −n k
= ∑ (nwi + 1 − wi )Si2
n ( N − 1) i =1
2
N −n k N −n k
= ∑ i i n2 ( N − 1) ∑
n( N − 1) i =1
w S 2
+
i =1
(1 − wi ) Si2 .
Assuming N − 1 ≈ N .
N −n n 2 2 N −n n
V ( y post ) =
Nn i =1
∑ wi Si + n2 N ∑ i =1
(1 − wi ) Si2
N −n n
= V prop ( yst ) + ∑ (1 − wi )Si2 .
Nn 2 i =1
The second term is the contribution in the variance of y post due to mi ' s not being proportionately
distributed.
7
If Si2 ≈ S w2 , say for all i, then the last term is
N −n k N −n 2 k
2 ∑ ∑ w = 1)
(1 − wi ) Si2 = S w (k − 1) (Since i
Nn i =1 Nn 2 i =1
 k − 1  N − n  2
=   Sw
 n  Nn 
k −1
= Var ( yst ).
n
n
The increase in variance over Varprop ( yst ) is small if the average sample size n = per stratum is
2
reasonably large.
Thus a post stratification with a large sample produces an estimator which is almost as precise as an
estimator in stratified sampling with proportional allocation.
8
Sampling Theory
MODULE V
LECTURE - 14
RATIO AND PRODUCT
METHODS OF ESTIMATION
DR. SHALABH
1
An important objective in any statistical estimation procedure is to obtain the estimators of parameters of interest with more
precision. It is also well understood that incorporation of more information in the estimation procedure yields better
estimators, provided the information is valid and proper. Use of such auxiliary information is made through the ratio
method of estimation to obtain an improved estimator of population mean. In ratio method of estimation, auxiliary information
on a variable is available which is linearly related to the variable under study and is utilized to estimate the population mean.
Let Y be the variable under study and X be any auxiliary variable which is correlated with Y. The observation xi on X and
yi on Y are obtained for each sampling unit. The population mean X of X (or equivalently the population total X tot ) must
be known. For example, xi ' s may be the values of yi ' s from
 some earlier completed census,
 some earlier surveys,
 some characteristic on which it is easy to obtain information etc.
For example, if yi is the quantity of fruits produced in the ith plot, then xi can be the area of ith plot or the production of
fruit in the same plot in previous year.

2
Let ( x1 , y1 ), ( x2 , y2 ),..., ( xn , yn ) be the random sample of size n on paired variable (X, Y) drawn, preferably by SRSWOR,
from a population of size N. The ratio estimate of population mean Y is
y
YˆR = X = RX
ˆ
x
N
assuming the population mean X is known. The ratio estimator of population total Ytot = ∑ Yi is
i =1
y
YˆR (tot ) = tot X tot
xtot
n n
∑ yi xtot = ∑ xi
N
where X tot = ∑ X i is the population total of X which is assumed to be known, ytot = and
i =1
are the
i =1 i =1
sample totals of Y and X respectively. The YˆR ( tot ) can be equivalently expressed as
y
YˆR (tot ) = X tot
x
ˆ .
= RX tot
Ytot
Looking at the structure of ratio estimators, note that the ratio method estimates the relative change that occurred
X tot
yi
after ( xi , yi ) were observed. It is clear that if the variation among the values of is nearly same for all i = 1,2,...,n then
ytot xi
y
values of (or equivalently ) vary little from sample to sample and ratio estimate will be of high precision.
xtot x
3
Bias and mean squared error of ratio estimator:
Assume that the random sample ( xi , yi ), i = 1, 2,..., n is drawn by SRSWOR and population mean X is known. Then
N
 
n
1 yi
E (YˆR ) = ∑x X
N i =1 i
 
n
≠ Y (in general).
y  y2 
Moreover it is difficult to find the exact expression for E  and E  2  . So we approximate them and proceed as
x x 
follows:
Let
y −Y
ε0 = ⇒ y = (1 + ε o )Y
Y
x−X
ε1 = ⇒ x = (1 + ε1 ) X .
X
Since SRSWOR is being followed, so
E (ε 0 ) = 0
E (ε1 ) = 0
4
1
E (ε 02 ) = E ( y − Y )2
Y2
1 N −n 2
= SY
Y 2 Nn
f SY2
=
n Y2
f 2
= CY
n
N −n 2 1 N S
where f =
N
, SY = ∑ (Yi − Y )2 and CY = YY
N − 1 i =1
is the coefficient of variation related to Y.
Similarly,
f 2
E (ε12 ) = CX
n
1
E (ε 0ε1 ) = E[( x − X )( y − Y )]
XY
1 N −n 1 N
=
XY Nn N − 1 i =1
∑ ( X i − X )(Yi − Y )
1 f
= S XY
XY n
1 f
= ρ S X SY
XY n
f S S
= ρ X Y
n X Y
f
= ρ C X CY
n
5
where C X =
SX
is the coefficient of variation related to X and ρ is the correlation coefficient between X and Y.
X
Writing YˆR in terms of ε ' s, we get
y
YˆR = X
x
(1 + ε 0 )Y
= X
(1 + ε1 ) X
= (1 + ε 0 )(1 + ε1 ) −1Y .
Assuming ε1 < 1, the term (1 + ε1 ) may be expanded as an infinite series and it would be convergent. Such assumption
−1
x−X
means that < 1, i.e., possible estimate x of population mean X lies between 0 and 2X , This is likely to hold true if
X
the variation in x is not large. In order to ensures that variation in x is small, assume that the sample size n it is fairly large.
With this assumption,
YˆR = Y (1 + ε 0 )(1 − ε1 + ε12 − ...)

= Y (1 + ε 0 − ε1 + ε12 − ε1ε 0 + ...).
So the estimation error of YˆR is

YˆR − Y = Y (ε 0 − ε1 + ε12 − ε1ε 0 + ...).
In case, when sample size is large, then ε 0 and ε1 are likely to be small quantities and so the terms involving second and
higher powers of ε 0 and ε1 would be negligibly small.
6
In such a case
YˆR − Y  Y (ε 0 − ε1 )
and
E (YˆR − Y ) = 0.
So the ratio estimator is an unbiased estimator of population mean upto the first order of approximation.
If we assume that only terms of ε 0 and ε1 involving powers more than two are negligibly small (which is more realistic than
assuming that powers more than one are negligibly small), then the estimation error of YˆR can be approximated as
YˆR − Y  Y (ε 0 − ε1 + ε12 − ε1ε 0 )
and
 f f 
E (YˆR − Y ) = Y  0 − 0 + C X2 − ρ C X C y 
 n n 
f
Bias (Yˆ ) = E (YˆR − Y ) = YC X (C X − ρ CY )
n
upto second order of approximation, the bias generally decreases as the sample size grows large.
7
The bias of YˆR is zero, i.e.,
Bias(YˆR ) = 0
if E (ε12 − ε 0ε1 ) = 0
Var ( x ) Cov ( x , y )
or if − =0
X2 XY
1  X 
or if Var ( x ) − Cov ( x , y )  = 0
X 2  Y 
Cov ( x , y )
or if Var ( x ) − = 0 (assuming X ≠ 0)
R
Y Cov ( x , y )
or if R = =
X Var ( x )
which is satisfied when the regression line of Y on X passes through origin.
Now, to find the mean squared error, consider
MSE (YˆR ) = E (YˆR − Y ) 2
= E Y 2 (ε 0 − ε1 + ε12 − ε1ε 0 + ...) 2 
 E Y 2 (ε 02 + ε12 − 2ε 0ε1 )  .
Under the assumption ε1 <1 and the terms of ε 0 and ε1 involving powers more than two are negligible small,
f f 2f 
MSE (YˆR ) = Y 2  C X2 + CY2 − ρ C X CY 
 n n n 
Y2f
= C X2 + CY2 − 2 ρ C X C y 
n 
8
up to the second order of approximation.

Sampling Theory
MODULE V
LECTURE - 15
RATIO AND PRODUCT
DR. SHALABH
1
Efficiency of ratio estimator in comparison to SRSWOR
Ratio estimator is better estimate of Y than sample mean based on SRSWOR if
MSE (YˆR ) < VarSRS ( y )

f f
or if Y 2 (C X2 + CY2 − 2 ρ C X CY ) < Y 2 CY2
n n
or if C X2 − 2 ρ C X CY < 0
1 CX
or if ρ > .
2 CY
Thus ratio estimator is more efficient than sample mean based on SRSWOR if
1 CX
ρ> if R > 0
2 CY
1 CX
and ρ < − if R < 0.
2 CY
It is clear from this expression that the success of ratio estimator depends on how close is the auxiliary information to the
variable under study.
2
Upper limit of ratio estimator:
Consider
Cov( Rˆ , x ) = E ( Rx
ˆ ) − E ( Rˆ ) E ( x )
y 
= E  x  − E ( Rˆ ) E ( x )
x 
= Y − E ( Rˆ ) X .
Thus
Y Cov( Rˆ , x )
E ( Rˆ ) = −
X X
Cov( Rˆ , x )
= R−
X
Bias ( Rˆ ) = E ( Rˆ ) − R
Cov( Rˆ , x )
=−
X
ρ Rˆ , x σ Rˆσ x
=−
X
where ρ Rˆ , x is the correlation between Rˆ and x ; σ Rˆ and σ x are the standard errors of Rˆ and x respectively. 3
Thus
− ρ Rˆ , x σ Rˆσ x
Bias ( Rˆ ) =
X
σ Rˆσ x
≤
X
(ρ Rˆ , x )
≤1 .
assuming X > 0.
Thus
Bias ( Rˆ ) σx
≤
σ Rˆ X
Bias ( Rˆ )
or ≤ CX
σ Rˆ
where C X is the coefficient of variation of X. If C X < 0.1, then the bias in R̂ may be safely regarded as negligible in
relation to standard error of Rˆ .
4
Alternative form of MSE (YˆR )
Consider
N N 2
∑ (Y − RX ) = ∑ (Y − Y ) + (Y − RX ) 
i =1
i i
2
i =1
i i
N 2
= ∑ (Yi − Y ) + R( X i − X )  (Using Y = RX )
i =1
N N N
= ∑ (Yi − Y ) 2 + R 2 ∑ ( X i − X ) 2 − 2 R ∑ ( X i − X )(Yi − Y )
i =1 i =1 i =1
N
1
∑ (Yi − RX i )2 = SY2 + R 2 S X2 − 2 RS XY .
N − 1 i =1
The MSE of YˆR has already been derived which is now expressed again as follows:
f
MSE (YˆR ) = Y 2 (CY2 + C X2 − 2 ρ C X CY )
n
f 2  SY2 S X2 S 
= Y  2 + 2 − 2 XY 
n  Y X XY 
f Y2  2 Y2 2 Y 
=  SY + 2 S X − 2 S XY 
n Y2  X X 
f 2
=
n
( SY + R 2 S X2 − 2 RS XY )
N
f
= ∑
n( N − 1) i =1
(Yi − RX i ) 2
N −n N
= ∑ (Yi − RX i )2 .
nN ( N − 1) i =1
5
Estimate of MSE (YˆR )
Let U i = Yi − RX i , i = 1, 2,.., N then MSE of YˆR can be expressed as
f 1 N
MSE (YˆR ) = ∑ (U i − U )2
n N − 1 i =1
f 2
= SU
n
1 N
where SU2 = ∑ (U i − U )2 .
N − 1 i =1
Based on this, a natural estimator of MSE (YR ) is

ˆ
 (Yˆ ) = f s 2
MSE R u
n
1 n
where su2 = ∑
n − 1 i =1
(ui − u ) 2
2
1 n
= ∑ ( yi − y ) − Rˆ ( xi − x ) 
n − 1 i =1 
= s y2 + Rˆ 2 sx2 − 2 Rs
ˆ ,
xy
y
Rˆ = .
x
6
Based on the expression
N
f
MSE (YˆR ) = ∑ (Yi − RX i ) 2 ,
n( N − 1) i =1
an estimate of MSE (YˆR ) is
n
 (Yˆ ) = f
MSE R ∑
n(n − 1) i =1
ˆ )2
( yi − Rxi
f 2 ˆ2 2 ˆ ).
= ( s y + R sx − 2 Rs xy
n
7
Confidence interval of ratio estimator
If the sample is large so that the normal approximation is applicable, then the 100(1- α )% confidence intervals of Y and R
are
ˆ  ˆ ˆ  ˆ 
 YR − Z α Var (YR ), YR + Z α Var (YR ) 
 2 2 
and
ˆ   ( Rˆ ) 
 R − Z α Var ( Rˆ ), Rˆ + Z α Var 
 2 2 
respectively where Z α is the normal derivate to be chosen for given value of confidence coefficient (1 − α ).
2
If ( x , y ) follows a bivariate normal distributions, then (Y − Rx ) is normally distributed. If SRS is followed for drawing the
sample, then assuming R is known, the statistic
y − Rx
N −n 2
( s y + R 2 sx2 − 2 R sxy )
Nn
is approximately N(0,1).
This can also be used for finding confidence limits, see Cochran (1977, Chapter 6, page 156) for more details.
8
Sampling Theory
MODULE V
LECTURE - 16
RATIO AND PRODUCT
DR. SHALABH
1
Conditions under which the ratio estimate is optimum
The ratio estimate YˆR is best linear unbiased estimator of Y when
i. the relationship between yi and xi is linear passing through origin., i.e.
yi = β xi + ei ,
where ei ' s are independent with E (ei | xi ) = 0 and β is the slope parameter.
ii. this line is proportional to xi, i.e.

Var ( yi | xi ) = E (ei2 ) = Cxi
where C is constant.
n
Proof. Consider the linear estimate of β as βˆ = ∑ y
i =1
i i where yi = β xi + ei .
Then β̂ is unbiased if Y = β X because E ( y ) = β X + E (ei | xi ).
If n sample values of xi are kept fixed and then in repeated sampling

n
E ( βˆ ) = ∑  i xi β
i =1
n n
and Var ( βˆ ) = ∑  2i Var ( yi | xi ) = C ∑  2i xi .
i =1 i =1
n
So E ( βˆ ) = β when ∑ x
i =1
i i = 1.
n
Consider the minimization of Var ( yi | xi ) subject to condition for unbiased estimator ∑ x
i =1
i i = 1 using Lagrangian
function.
2
Thus the Lagrangian function with Lagrangian multiplier λ is
n
ϕ = Var ( yi / xi ) − 2λ (∑  i xi − 1)
i =1
n n
= C ∑ 12 xi − 2λ (∑  i xi − 1).
i =1 i =1
Now
∂ϕ
= 0 ⇒  i xi = λ xi , i = 1, 2,.., n
∂ i
∂ϕ n
= 0 ⇒ ∑  i xi = 1.
∂λ i =1
Using
n
∑ x
i =1
i i =1
n
or ∑λx
i =1
i =1
1
or λ= .
nx
1
i =
nx
and so n
∑y i
y
βˆ = i =1
= .
nx x
Thus β̂ is not only superior to y but also best in the class of linear and unbiased estimators.
3
Alternative Approach:
This result can alternatively be derived as follows:
The ratio estimator Rˆ =

y is the best linear unbiased estimator of R = Y if the following two conditions hold:
x x
i. For fixed x, E ( y ) = β x, i.e., the line of regression of y on x is a straight line passing through the origin.
ii. For fixed x, Var ( x) ∝ x, i.e., Var ( x) = λ x where λ is constant of proportionality.
Proof. Let y = ( y1) , y2 ,..., yn ) ' and x = ( x1 , x2 ,..., xn ) ' be two vectors of observations on y ' s and x ' s. Hence for any fixed x,
E ( y) = β x
Var ( y ) = Ω = λ diag ( x1 , x2 ,..., xn )
where diag ( x1 , x2 ,..., xn ) is the diagonal matrix with x1 , x2 ,..., xn as the diagonal elements.
The best linear unbiased estimator of β is obtained by minimizing
S 2 = ( y − β x ) ' Ω −1 ( y − β x )
n
( yi − β xi ) 2
=∑ .
i =1 λ xi
Solving
∂S 2
=0
∂β
n
⇒ ∑ ( yi − βˆ xi ) = 0
i =1
y
or βˆ = = Rˆ
x
4
ˆ = Yˆ is the best linear unbiased estimator of Y .

Thus R̂ is the best linear unbiased estimator of R. Consequently, RX R
Ratio estimator in stratified sampling
Suppose a population of size N is divided into k strata. The objective is to estimate the population mean Y using ratio
method of estimation.
In such situation, a random sample of size ni is being drawn from ith strata of size Ni on variable under study Y and
auxiliary variable X using SRSWOR.
Let
yij : jth observation on Y from ith strata
xij : jth observation on X from ith strata i =1, 2,…,k; j = 1,2,…,ni.
An estimator of Y based on the philosophy of stratified sampling can be derived in following two possible ways:
1. Separate ratio estimator
 Employ first the ratio method of estimation separately in each strata and obtain ratio estimator YˆRi i = 1, 2,.., k
assuming the stratum mean X i to be known.
 Then combine all the estimates using weighted arithmetic mean.
This gives the separate ratio estimator as

k N Yˆ
ˆ
YRs = ∑
i Ri
i =1 N
k
= ∑ wiYˆRi
i =1
k
yi
= ∑ wi Xi
xi
5
i =1
where
ni
1
yi =
ni
∑y
j =1
ij : sample mean of Y from ith strata
ni
1
xi =
ni
∑x
j =1
ij : sample mean of X from ith strata
Ni
1
Xi =
Ni
∑x
j =1
ij : mean of all the units in ith strata.
No assumption is made that the true ratio remains constant from stratum to stratum. It depends on information on each X i .
2. Combined ratio estimator:
• Find first the stratum mean of Y ' s and X ' s as

k
yst = ∑ wi yi
i =1
k
xst = ∑ wi xi .
i =1
• Then define the combined ratio estimator as

y
YˆRc = st X
xst
N
where X is the population mean of X based on all the N = ∑ N i units. It does not depend on individual stratum units. It
i =1
does not depend on information on each X i but only on X .
6
Sampling Theory
MODULE V
LECTURE - 17
RATIO AND PRODUCT
DR. SHALABH
1
Properties of separate ratio estimator:
k k
Note that there is an analogy between Y = ∑ wY
i =1
i i and YRs = ∑ wiYRi .
i =1
y
We already have derived the bias of YˆR = X as
x
Yf
E (YˆR ) = Y + (C x2 − ρ C X CY ).
n
So for YˆRi , we can write
f
E (YˆRi ) = Yi + Yi i (Cix2 − ρi CiX CiY )
ni
Ni Ni
1 1
where Yi =
Ni
∑y ,
j =1
ij Xi =
Ni
∑x
j =1
ij
2
N i − ni 2 Siy S2
fi = , Ciy = 2 , Cix2 = ix2 ,
Ni Yi Xi
1 Ni 1 Ni
Siy2 = ∑ ij i ix N − 1 ∑
N i − 1 j =1
(Y − Y ) 2
, S 2
=
j =1
( X ij − X i ) 2 ,
i
ρi : correlation coefficient between the observation on X and Y in ith stratum

Cix : coefficient of variation of X values in ith sample. 2
Thus
k
E (YˆRs ) = ∑ wi E (YˆRi )
i =1
k
 f 
= ∑ wi Yi + Yi i (Cix2 − ρi Cix Ciy 
i =1  ni 
k
wiYi fi 2
=Y +∑ (Cix − ρi Cix Ciy )
i =1 ni
Bias (YˆRs ) = E (YRs ) − Y

k
wiYi f i
=∑ Cix (Cix − ρi Ciy ).
i =1 ni
Assuming finite population correction to be approximately 1, ni = n / k and Cix , Ciy and ρi are same for the strata as
Cx , C y and ρ respectively, we have
 k
Bias (Y Rs ) = (Cx2 − ρ Cx C y ).
n
Thus the bias is negligible when the sample size within each stratum should be sufficiently large and Y Rs is unbiased when
Cix = ρ Ciy .
3
Now we derive the MSE of YˆRs . We already have derived the MSE of Yˆ earlier as
R
Y2f
MSE (YˆR ) = (C X2 − CY2 − 2 ρ Cx C y )
n
N
f
= ∑
n( N − 1) i =1
(Yi − RX i ) 2
Y
where R = .
X
Thus for ith stratum
fi
MSE (YˆRi ) = (CiX2 − CiY2 − 2 ρi CiX CiY )
ni ( N i − 1)
Ni
fi
= ∑
ni ( N i − 1) i =1
(Yij − Ri X ij ) 2
and so
k
MSE (YˆRs ) = ∑ wi2 MSE (YˆRi )
i =1
k
 w2 f 
= ∑  i i Yi 2 (CiX2 + CiY2 − 2 ρi CiX CiY ) 
i =1  ni 
k  Ni

fi
= ∑  wi2 ∑ (Yij − Ri X ij ) 2 .
i =1  ni ( N i − 1) j =1 
4
An estimate of MSE (YˆRs ) can be found by substituting the unbiased estimators of SiX , SiY and SiXY as six2 , siy2 and sixy
2 2 2
respectively for ith stratum and Ri = Yi / X i can be estimated by ri = yi / xi .
 (Yˆ ) =  wi f i ( s 2 + r 2 s 2 − 2r s ) .
k 2
MSE Rs ∑  n
i =1 
iy i ix i ixy 
i 
Also
 (Yˆ ) =  wi f i 
k 2 ni
MSE Rs ∑  ∑
i =1  ni ( ni − 1) j =1
( yij − ri xij ) 2 .

5
Properties of combined ratio estimator:
Here
k
∑w y i i
yst
YˆRC = i =1
k
X= X = Rˆ c X .
xst
∑w x
i =1
i i
It is difficult to find the exact expression of bias and mean squared error of YˆRc , so we find their approximate expressions.
Define
yst − Y
ε1 =
Y
xst − X
ε2 =
X
E (ε1 ) = 0
E (ε 2 ) = 0
k
N i − ni wi2 SiY2 k
f i wi2 SiY2
E (ε12 ) = ∑ = ∑
i =1 N i ni Y 2 i =1 ni Y2
k
f i wi2 SiX2
E (ε 22 ) = ∑
i =1 ni X 2
k
wi2 f i SiXY
E (ε1ε 2 ) = ∑ .
i =1 ni XY
6
Thus assuming ε 2 < 1,
(1 + ε1 )Y
YˆRC = X
(1 + ε 2 ) X
= Y (1 + ε1 )(1 − ε 2 + ε 22 − ...)
= Y (1 + ε1 − ε 2 − ε1ε 2 + ε 22 − ...).
Retaining the terms upto order two due to same reason as in the case of YˆR ,
YˆRC  Y (1 + ε1 − ε 2 − ε1ε 2 + ε 22 )
YˆRC − Y  Y (ε1 − ε 2 − ε1ε 2 + ε 22 ).
The approximate bias of Yˆ upto second order of approximation is

Rc
Bias (YˆRc ) = E (YˆRc − Y )

 YE (ε1 − ε 2 − ε1ε 2 + ε 22 )
= Y 0 − 0 − E (ε1ε 2 ) + E (ε 22 ) 
k 
f  S 2 S 
= Y ∑  i wi2  iX2 − iXY  
i =1  ni X XY  
k 
f  S 2 ρ S S 
= Y ∑  i wi2  iX2 − i iX iY  
i =1  ni X XY  
Y k  fi S ρ S 
=
X
∑n wi2 SiX  iX − i iY  
i =1  i  X Y 
k
f 
= R ∑  i wi2 SiX ( CiX − ρi CiY ) 
7
i =1  ni 
Y
where R = , ρi is the correlation coefficient between the observations on Y and X in the ith stratum, Cix and Ciy are the
X
coefficients of variation of X and Y respectively in ith stratum.
The mean squared error up to second order of approximation is
MSE (YˆRc ) = E (YˆRc − Y ) 2

 Y 2 E (ε1 − ε 2 − ε1ε 2 + ε 2 ) 2
 Y 2 E (ε12 + ε 22 − 2ε1ε 2 )
k
f  S 2 S 2 2S 
= Y 2 ∑  i wi2  iX2 + iY2 − iXY 
i =1  ni X Y XY 
k
f  S2 S2 S S 
= Y 2 ∑  i wi2  iX2 + iY2 − 2 ρ i iX iY  
i =1  ni X Y X Y 
Y 2 k  fi 2  Y 2 2 Y 
= ∑  wi  X 2 SiX + SiY2 − 2ρi X SiX SiY 
Y 2 i =1  ni  
k
f 
= ∑  i wi2 ( R 2 SiX2 + SiY2 − 2 ρ i RSiX SiY ) .
i =1  ni 
8
An estimate of MSE (YˆRc ) can be obtained by replacing SiX2 , SiY2 and SiXY by their unbiased estimators six , siy and sixy
2 2
Y y
respectively whereas R = is replaced by r = as follows:
X x
Thus the following estimate is obtained:
k  2  six2 siy2 sixy  

 (Yˆ ) = y 2  w1 f i
MSE Rc ∑  2 + 2 −2 
 ni
i =1  X Y XY  
k
 w2 f 
= ∑  i i ( r 2 six2 + siy2 − 2 rsixy )
i =1  ni 
where X is known.
9
Comparison of combined and separate ratio estimators
An obvious question arises that which of the estimates YˆRs or YˆRc is better. So we compare their MSEs. Note that the only
difference in the term of these MSEs is due to the form of ratio estimate. It is
yi
* Ri = in MSE (YˆRs )
xi
Y
* R= in MSE (YˆRc ).
X
Thus
∆ = MSE (YˆRc ) − MSE (YˆRs )
k
 w2 f 
= ∑  i i ( R 2 − Ri2 ) SiX2 + 2( Ri − R) ρi SiX SiY  
i =1  ni 
k
 w2 f 
= ∑  i i ( R − Ri ) 2 SiX2 + 2( R − Ri )( Ri SiX2 − ρi SiX SiY )  .
i =1  ni 
10
The difference ∆ depends on
i. The magnitude of the difference between the strata ratios ( Ri ) and whole population ratio (R).
ii. The value of ( Ri Six2 − ρi Six Siy ) is usually small and vanishes when the regression line of y on x is linear and
passes through origin within each stratum. In such a case
MSE (YˆRc ) > MSE (YˆRs )
but Bias (YˆRc ) < Bias (YˆRs ).
So unless Ri varies considerably, the use of YˆRc would provide an estimate of Y with negligible bias and precision as good
as YˆRs .
• If Ri ≠ R, YˆRs can be more precise but bias may be large.

ˆ
• If Ri  R, YRc can be as precise as YˆRs but its bias will be small. It also does not require knowledge of X 1 , X 2 ,..., X k .
11
Sampling Theory
MODULE V
LECTURE - 18
RATIO AND PRODUCT
DR. SHALABH
1
Ratio estimators with reduced Bias:
The ratio type estimators that are unbiased or have smaller bias than Rˆ , YˆR or YˆRc ( tot ) are useful in sample surveys . There
are several approaches to derive such estimators. We consider here two such approaches:
1. Unbiased ratio-type estimators:

y
Under SRS, the ratio estimator has form X to estimate the population mean Y . As an alternative to this, we consider
x
following as an estimator of population mean
1 n Y 
YˆRo = ∑  i  X .
n i =1  X i 
Let
Yi
Ri = , i = 1, 2,.., N ,
Xi
then
1 n
YˆR 0 = ∑ Ri X
n i =1
= rX
where
1 n
r= ∑ Ri .
n i =1
2
Now
Bias (YˆR 0 ) = E (YˆR 0 ) − Y
= E (rX ) − Y
= E (r ) X − Y .
Since
1 n 1 N
E (r ) = ∑(
n i =1 N
∑R )
i =1
i
N
1
=
N
∑R
i =1
i
= R.
So
Bias (YˆR 0 ) = RX − Y .
3
N −n
Using the result that under SRSWOR, Cov( x , y ) = S XY , it also follows that
Nn
N −n 1 N
Cov(r , x ) = ∑ ( Ri − R )( X i − X )
Nn N − 1 i =1
N −n 1  N 
=  ∑
Nn N − 1  i =1
Ri X i − NRX 

N − n 1  N Yi 
= ∑ X i − NRX 
Nn N − 1  i =1 X i 
N −n 1
= ( NY − NRX )
Nn N − 1
N −n 1 
=  − Bias (YˆR 0 )  .
n N −1  
N −n
Thus using the result that in SRSWOR, Cov( x , y ) = S XY , we have
Nn
n( N − 1)
Bias (YˆRo ) = − Cov(r , x )
N −n
n( N − 1) N − n
=− S RX
N − n Nn
 N −1 
= −  S RX
 N 
1 N
where S RX = ∑ ( Ri − R )( X i − X ).
N − 1 i =1
4
The following result helps is obtaining an unbiased estimator of population mean:.
Since under SRSWOR set up,
E ( sxy ) = S xy
where
1 n
sxy = ∑ ( xi − x )( yi − y ),
n − 1 i =1
1 N
S xy = ∑ ( X i − X )(Yi − Y ).
N − 1 i =1
ˆ
So an unbiased estimator of the bias is Bias (YR 0 ) = −( N − 1) S RX which is obtained as follows:
 (Yˆ ) = −( N − 1) s
Bias R0 rx
N −1 n
=− ∑ (ri − r )( xi − x )
N ( n − 1) i =1
N −1 n
=− (∑ ri xi − n r x )
N ( n − 1) i =1
N − 1  n yi 
=−  ∑ xi − nr x 
N ( n − 1)  i =1 xi 
N −1
=− (ny − nr x ).
N ( n − 1)
5
So

 (Yˆ ) = E n( N − 1)
Bias (YˆR 0 ) − Y = − ( y − r x ).
N ( n − 1)
R0
Thus
E YˆR 0 − Bias (YˆR 0 )  = Y

 
 n( N − 1) 
or E YˆR 0 + ( y − r x ) = Y .
 N (n − 1) 
Thus
n( N − 1) n( N − 1)
YˆR 0 + ( y − r x ) = rX + (y − r x)
N ( n − 1) N ( n − 1)
is an unbiased estimator of population mean.
6
2. Jackknife method for obtaining a ratio estimate with lower bias
Jackknife method, is used to get rid of the term of order 1/n from the bias of an estimator. Suppose the E ( Rˆ ) can be
expanded after ignoring finite population correction as

a1 a2
E ( Rˆ ) = R + + + ...
n n2
Let n = mg and the sample is divided at random into g groups, each of size m. Then
ga1 ga
E ( gRˆ ) = gR + + 2 2 2 + ...
gm g m
a1 a
= gR + + 2 2 + ...
m gm
Let Rî* = ∑ *
yi
where the ∑ *
denotes the summation over all values of the sample except the ith group. So Rî* is
∑ *
xi
based on a simple random sample of size m(g - 1), so we can express
a1 a2
E ( Rî* ) = R + + + ...
m( g − 1) m 2 ( g − 1) 2
or
a a
E ( g − 1) Rî*  = ( g − 1) R + 1 + 2 2 + ...
m m ( g − 1)
Thus
a2
E  gRˆ − ( g − 1) Rî*  = R − + ...
g ( g − 1)m 2
or
a g
E  gRˆ − ( g − 1) Rî*  = R − 22 + ...
7
n g −1
1
Hence the bias of  gRˆ − ( g − 1) Rî*  is of order 2 .
n
Now g estimates of this form can be obtained, one estimator for each group. Then the jackknife or Quenouille’s estimator is
the average of these of estimators
∑ Rˆ i
RˆQ = gRˆ − ( g − 1) i =1
.
g
8
Sampling Theory
MODULE V
LECTURE - 19
RATIO AND PRODUCT
DR. SHALABH
1
Product method of estimation:
1 Cx
The ratio estimator is more efficient than the mean of a SRSWOR if ρ > . , provided R > 0, which is usually the case.
2 Cy
1 Cx
This shows that if auxiliary information is such that ρ < − , then we cannot use the ratio method of estimation to
2 Cy
improve the sample mean as an estimator of population mean. So there is need of another type of estimator which also
makes use of information on auxiliary variable x. Product estimator is an attempt in this direction.
The product estimator of the population mean Y is defined as

yx
YˆP = .
X
assuming the population mean X to be known
We now derive the bias and variance of Yˆp .
Let
y −Y x−X
ε0 = , ε1 = ,
Y X
2
(i) Bias of Yˆp
We write Yˆp as
yx
Yˆp = = Y (1 + ε 0 )(1 + ε1 )
X
= Y (1 + ε 0 + ε1 + ε 0ε1 ).
Taking expectation we obtain bias of Yˆp as
1 f
Bias (Yˆp ) = Cov( y , x ) = S xy ,
X nX
which shows that bias of Yˆp decreases as n increases.
Bias of Yˆp can be estimated by
 (Yˆ ) = f s .
Bias p xy
nX
3
(ii) Variance of Yˆp
Writing Yˆp is terms of ε 0 and ε1 , we find that the variance of the product estimator Yˆ up to second order of
p
approximation is given by
Var (Yˆp ) = E (Yˆp − Y ) 2
= Y 2 E (ε 0 + ε1 + ε 0ε1 ) 2
= Y 2 E (ε 02 + ε12 + 2ε 0ε1 ).
Here terms in (ε1 , ε 0 ) of degrees greater than two are assumed to be negligible. Using the expected values we find that
f
Var (Yˆp ) =  SY2 + R 2 S X2 + 2 RS XY  .
n
4
(iii) Estimation of variance of Yˆp
The variance of Yˆp can be estimated by
 (Yˆ ) = f  s 2 + r 2 s 2 + 2rs 
Var
n
p y x xy 
where r = y / x.
(iv) Comparison with SRSWOR:
From the variances of the mean of SRSWOR and the product estimator, we obtain
f
Var ( y ) SRS − Var (Yˆp ) = − RS X (2 ρ SY + RS X ),
n
which shows that Yˆp is more efficient than the simple mean y for
1 Cx
ρ<− if R > 0
2 Cy
and for
1 Cx
ρ >− if R < 0.
2 Cy
5
Multivariate ratio estimator
Let y be the study variable and X 1 , X 2 ,..., X p be p auxiliary variables assumed to be correlated with y . Further it is
assumed that X 1 , X 2 ,..., X p are independent. Let Y , X 1 , X 2 ,..., X p be the population means of the variables y,
X 1 , X 2 ,..., X p . We assume that a SRSWOR of size n is selected from the population of N units. The following notations
will be used.
Si2 = the population mean sum of squares for the variate X i ,

si2 = the sample mean sum of squares for the variate X i ,
S02 = the population mean sum of squares for the study variable y,
s02 = the sample mean sum of squares for the study variable y,
Si
Ci = = coefficient of variation of the variate X i ,
Xi
S0
C0 = = coefficient of variation of the variate y,
Y
Siy
ρi = = coefficient of correlation between y and X i ,
Si S 0
y
YˆRi = X i = ratio estimator of Y , based on X i
xi
where i = 1, 2,..., p. Then the multivariate ratio estimator of Y is given as follows

p p
YˆMR = ∑ wiYˆRi , ∑w =1 i
i =1 i =1
p
Xi
= y ∑ wi .
6
i =1 xi
(i) Bias of the multivariate ratio estimator:
The bias of YˆRi is
f
Bias (YˆRi ) = Y (Ci2 − ρi Ci C0 ).
n
The bias of YˆMR is obtained as
p
Yf
Bias (YˆMR ) = ∑ wi (Ci2 − ρi Ci C0 )
i =1 n
p
Yf
=
n
∑ w C (C − ρ C ).
i =1
i i i i 0
(ii) Variance of the multivariate ratio estimator:
The variance of YˆRi is given by
f
Var (YˆRi ) = Y 2 (C02 + Ci2 − 2 ρi C0Ci ).
n
The variance of YˆMR is obtained as
p
f
Var (YˆMR ) = Y 2 ∑ wi2 (C02 + Ci2 − 2 ρi C0Ci ).
n i =1
7
Sampling Theory
MODULE VI
LECTURE - 20
REGRESSION METHOD OF
ESTIMATION
DR. SHALABH
1
The ratio method of estimation uses the auxiliary information which is correlated with the study variable to
improve the precision which results in improved estimators when the regression of y on x is linear and passes
through origin. When the regression of Y on X is linear, it is not necessary that the line should always pass
through origin. Under such conditions, it is more appropriate to use the regression type estimators.
X
In ratio method, the conventional estimator sample mean y was improved by multiplying it by a factor
x
where x is an unbiased estimator of population mean X which is chosen as population mean of auxiliary
variable. Now we consider another idea based on difference.
Consider an estimator ( x − X ) for which E ( x − X ) = 0.
Consider an improved estimator of Y as

Yˆ * = y + µ ( x − X )
µ µ ˆ*
which is an unbiased estimator of Y and is any constant. Now find such that the Var (Y ) is minimum
Var (Yˆ *) = Var ( y ) + µ 2 Var ( x ) + 2 µ Cov ( x , y )

∂Var (Y * )
=0
∂µ
Cov ( x , y )
⇒µ=−
Var ( x )
N −n
S XY
= Nn
N −n 2
SX
Nn
S XY 1 N 1 N
=
S X2
where S XY = ∑
N − 1 i =1
( X i − X )(Yi − Y ), S X2 = ∑ ( X i − X ).
N − 1 i =1
2
Note that the value of regression coefficient β in a linear regression model y = x β + e of y on x obtained by
n
Cov( x, y ) S xy
minimizing ∑e
i =1
2
i based on n data sets ( xi , yi ), i = 1, 2,.., n is β =
Var ( x)
= 2 . Thus the optimum value of
Sx
µ is
same as the regression coefficient of y on x with a negative sign, i.e.,
µ = −β .
So the estimator Yˆ * with optimum value of µ is
Yˆreg = y + β ( X − x )
which is the regression estimator of Y and the procedure of estimation is regression method of estimation.
The variance of Yˆreg is
Var (Yˆreg ) = V ( y )[1 − ρ 2 ( x , y )]
where ρ ( x , y ) is the correlation coefficient between x and y . So Yˆreg would be efficient if x and y are highly
correlated. The estimator Yˆreg is more efficient than Y if ρ ( x , y ) ≠ 0 which generally holds.
3
Regression estimates with pre-assigned β :
If value of β is known as β 0 (say) then the regression estimator is
Yˆreg = y + β 0 ( X − x )
Bias of Yreg :
Now, assuming that the random sample ( xi , yi ), i = 1, 2,.., n is drawn by SRSWOR,
E (Yˆreg ) = E ( y ) + β 0  X − E ( x ) 
= Y + β 0  X − X 
=Y .
Thus Yˆreg is an unbiased estimator of Y when β is known.
Variance of Yˆreg :
2
Var (Yˆreg ) = E Yˆreg − E (Yˆreg ) 
 
2
= E  y + β 0 ( X − x ) − Y 
2
= E ( y − Y ) − β 0 ( x − X ) 
= E ( y − Y ) 2 + β 02 ( x − X ) − 2 β 0 E ( x − X )( y − Y ) 
= Var ( y ) + β 02Var ( x ) − 2 β 0Cov ( x , y )

f
=  SY2 + β 02 S X2 − 2 β 0 S XY 
n
f
 SY2 + β 02 S X2 − 2 β 0 ρ S X SY 
4
=
n
where
N −n
f =
N
1 N
S X2 = ∑ ( X i − X )2
N − 1 i =1
1 N
SY2 = ∑ (Yi − Y )2
N − 1 i =1
ρ : Correlation coefficient between X and Y .
Comparing Var (Yˆreg ) with Var ( y ), we note that
Var (Yˆreg ) < Var ( y )
if β 02 S X2 − 2β 0 S XY < 0
 2 S XY 
or β 0 S X2  β 0 − <0
 S X2 
 2 S XY  2 S XY
which is possible when either β 0 < 0 and  β 0 −  > 0 ⇒ 2 < β 0 < 0.
 S X2  SX
 2 S XY  2S
or β 0 > 0 and  β 0 − )  < 0 ⇒ 0 < β 0 < 2XY .
 S X2  SX
5
Optimal value of β :
Choose β such that Var (Yˆreg ) is minimum .

So
∂Var (Yˆreg ) ∂
=  SY2 + β 2 S X2 − 2 βρ S X SY  = 0
∂β ∂β
SY S XY
⇒β =ρ = 2 .
SX SX
ρ SY
The minimum value of variance of Yˆreg with optimum value of β opt =
SX
is
f  S2 S 
Varmin (Yˆreg ) =  SY2 + ρ 2 Y2 S X2 − 2 ρ Y ρ S X SY 
n SX SX 
f 2
= SY (1 − ρ 2 ).
n
Since −1 ≤ ρ ≤ 1, so
Var (Yˆreg ) ≤ VarSRS ( y )

which always holds true.
6
Departure from β :
If β 0 is the preassigned value of regression coefficient, then

f
Varmin (Yˆreg ) =  SY2 + β 02 S X2 − 2 β 0 ρ S X SY 
n
f 2
=  SY + β 02 S X2 − 2 ρβ 0 S X SY − ρ 2 SY2 + ρ 2 SY2 
n
f
= (1 − ρ 2 ) SY2 + β 02 S X2 − 2 β 0 S X2 β opt + β opt
2
S X2 
n
f
= (1 − ρ 2 ) SY2 + ( β 0 − β opt ) 2 S X2 
n
ρ SY
where β opt = .
SX
Estimate of variance :
ˆ
An unbiased sample estimate of Var (Yreg ) is
n
 (Yˆ ) = f
∑ [( yi − y ) − β0 ( xi − x )]
2
Var reg
n(n − 1) i =1
n
f
=
n
∑ (s
i =1
2
y + β 02 sx2 − 2 β 0 sxy ).
7
Sampling Theory
MODULE VI
LECTURE - 21
ESTIMATION
DR. SHALABH
1
Regression estimates when β is computed from sample:
Suppose a random sample of size n, ( xi , yi ), i = 1, 2,.., n is drawn by SRSWOR. When β is unknown, it is

estimated as
sxy
βˆ =
sx2
and then the regression estimator of Y is
Yˆreeg = y + βˆ ( X − x ).
It is difficult to find the exact expressions of E (Yreg ) and Var (Yˆreg ). So we approximate them using the same
methodology as in the case of ratio method of estimation.
Let
y −Y
ε0 = ⇒ y = Y (1 + ε 0 )
Y
x−X
ε1 = ⇒ x = X (1 + ε1 )
x
sxy − S XY
ε2 = ⇒ sxy = S XY (1 + ε 2 )
S XY
sx2 − S x2
ε3 = ⇒ sx2 = S X2 (1 + ε 3 ).
S x2
2
Then
E (ε 0 ) = 0
E (ε1 ) = 0
E (ε 2 ) = 0
E (ε 3 ) = 0
f 2
E (ε 02 ) = CY
n
f 2
E (ε12 ) = CX
n
f
E (ε 0 ε1 ) = ρ C X CY
n
and
sxy
Yˆreg = y + 2 ( X − x )
sx
S xy (1 + ε 2 )
= Y (1 + ε 0 ) + (−ε1 X ).
S x2 (1 + ε 3 )
3
The estimation error of Yˆreg is
(Yˆreg − Y ) = Y ε 0 − β X ε1 (1 + ε 2 )(1 + ε 3 ) −1
S XY
where β= is the population regression coefficient.
S X2
Assuming ε 3 <1,
(Yˆreg − Y ) = Y ε 0 − β X (ε1 + ε1ε 2 )(1 − ε 3 + ε 32 − ....).
Retaining the terms upto second power of ε 's and ignoring other terms, we have
(Yˆreg − Y )  Y ε 0 − β X (ε1 + ε1ε 2 )(1 − ε 3 + ε 32 )
 Y ε 0 − β X (ε1 − ε1ε 3 + ε1ε 2 ).
4
Bias of Yˆreg :
Now the bias of Yreg is

ˆ
E (Yˆreg − Y )  E Y ε 0 − β X ε1 (ε1 + ε1ε 2 )(1 − ε 3 + ε 32 ) 
β Xf  µ21 µ 
=−  − 302 
n  XS XY XS X 
N −n
where f = , (r , s )th cross product moment is
N
µrs = E ( x − X ) r ( y − Y ) s 
so
µ21 = E ( x − X )2 ( y − Y ) 
µ30 = E ( x − X )3  .
Thus
β f  µ21 µ30 
Bias (Yˆreg ) = −  − .
n  S XY S X2 
Also,
E (Yˆreg ) = E ( y ) + E[ βˆ ( X − x )]
= Y + XE ( βˆ ) − E ( βˆ x )
= Y + E ( x ) E ( βˆ ) − E ( βˆ x )
= Y − Cov( βˆ , x )
Bias (Yˆreg ) = E (Yˆreg ) − Y = −Cov( βˆ , x ).

5
MSE of Yˆreg :
To obtain the MSE of Yˆreg , consider
E (Yˆreg − Y ) 2 ≈ E ε 0Y − β X (ε1 − ε1ε 3 + ε1ε 2 )  .

2
Retaining the terms of ε ' s upto the second power and ignoring others, we have
E (Yˆreg − Y ) 2 ≈ E ε 02Y 2 + β 2 X 2ε12 − 2 β XY ε 0ε 1 
= Y 2 E (ε 0 2 ) + β 2 X 2 E (ε12 ) − 2 β XYE (ε 0ε1 )
f  2 SY2 2 SX
2
S S 
= Y
 Y2 + β 2
X 2
− 2 β XY ρ X Y 
n  X XY 
MSE (Yˆreg ) = E (Yˆreg − Y ) 2

f 2
= ( SY + β 2 S X2 − 2 βρ S X SY ).
n
S XY S
Since β = 2
=ρ Y ,
SX SX
ˆ
so substituting it in MSE (Yreg ), we get
f
MSE (Yˆreg ) = SY2 (1 − ρ 2 ).
n
So upto second order of approximation, the regression estimator is better than the conventional sample mean
estimator under SRSWOR. This is because the regression estimator uses some extra information also.
Moreover, such extra information requires some extra cost also. This shows a false superiority in some sense.
So the regression estimators and SRS estimates can be combined if cost aspect is also taken into
consideration.
6
Comparison of regression estimate with ratio estimate and SRS sample mean estimate
f
MSE (Yˆreg ) = SY2 (1 − ρ 2 )
n
f
MSE (YˆR ) = ( SY2 + R 2 S X2 − 2 ρ RS X SY )
n
f
VarSRS ( y ) = SY2 .
n
(i) As MSE (Yˆreg ) = VarSRS ( y )(1 − ρ 2 ) because ρ 2 < 1, so Yˆreg is always superior to y.
(ii) Yˆreg is better than YˆR if MSE (Yreg ) ≤ MSE (YˆR )

f 2 f
or if SY (1 − ρ 2 ) ≤ ( SY2 + R 2 S X2 − 2 ρ RS X SY )
n n
or if ( RS X − ρ SY ) 2 ≥ 0
which always holds true.
So regression estimate is always superior to ratio estimate upto second order of approximation.
7
Sampling Theory
MODULE VI
LECTURE - 22
ESTIMATION
DR. SHALABH
1
Regression estimates in stratified sampling
Under the set up of stratified sampling, let the population of N sampling units be divided into k strata. The strata
k
sizes are N1 , N 2 ,.., N k such that ∑N
i =1
i = N . A sample of size ni on ( xij , yij ), j = 1, 2,.., ni , is drawn from ith strata
(i = 1,2,..,k) by SRSWOR where xij and yij denote the jth unit from ith strata on auxiliary and study variables,
respectively.
In order to estimate the population mean, there are two approaches.
1. Separate regression estimator
 Estimate regression estimator
Yˆreg = y + β 0 ( X − x )
from each stratum separately i.e., the regression estimate in the ith stratum is
Yˆreg (i ) = yi + βi ( X i − xi ).
 Find the stratified mean as the weighted mean of Yˆreg ( i ) i = 1, 2,.., k as

k N Yˆ
Yˆsreg = ∑
i reg ( i )
i =1 N
k
= ∑ [ wi { yi + βi ( X i − xi )}]
i =1
Sixy Ni
where βi = 2
, wi = .
S ix N
In this approach, the regression estimator is separately obtained in each stratum and then combined using the
Yˆsreg is termed as separate regression estimator,
2
philosophy of stratified sample. So

2. Combined regression estimator
Another strategy is to estimate x and y in the Yˆreg as respective stratified mean. Replacing x by
k k
xst = ∑ wi xi and y by yst = ∑ wi yi , we have
i =1 i =1
Yˆcreg = yst + β ( X − xst ).
In this case, all the sample information is combined first and then implemented in regression estimator, so Yˆreg
is termed as combined regression estimator.
Properties of separate and combined regression
In order to derive the mean and variance of Yˆsreg and Y,ˆcreg there are two cases
 when β is pre-assigned as β0 .
 when β is estimated from the sample.
sxy
We consider here the case that β β 0 . Other case when β is estimated as βˆ = 2 can be
is pre-assigned as
sx
dealt with the same approach based on defining various ε ' s and using the approximation theory as in the case
of Yˆ .
reg
3
1. Separate regression estimator
Assume β is known, say β 0 . Then

k
Yˆs reg = ∑ wi [ yi + β 0i ( X i − xi )]
i =1
( )
k
E Yˆs reg = ∑ wi [ E ( yi ) + β 0i ( X i − E ( xi )]
i =1
k
= ∑ wi [Yi + ( X i − X i )]
i =1
=Y.
2
Var (Yˆs reg ) = E Yˆs reg − E (Yˆs reg ) 
 
2
 k k

= E  ∑ wi yi + i + ∑ wi β 0i ( X i − xi ) − Y 
 i =1 i =1 
2
 k k

= E  ∑ wi ( yi − Y ) − ∑ wi β 0i ( xi − X i ) 
 i =1 i =1 
k k k
= ∑ wi2 E ( yi − Yi ) 2 + ∑ wi2 β 02i E ( xi − X i )]2 − 2∑ wi2 β 0i E ( xi − X i )( yi − Yi )
i =1 i =1 i =1
k k k
= ∑ wi2Var ( yi ) + ∑ wi2 β 02i Var ( xi ) − 2∑ wi2 β 0i Cov( xi , yi )
i =1 i =1 i =1
2
k
w f 2
=∑ ( SiY + β oi2 SiX2 − 2β 0i SiXY ).
i i
i =1 ni
4
S
Var (Yˆs reg ) is minimum when β 0i = iXY and so substituting β 0i , we have
SiX2
k
 w2 f 
Vmin (Yˆs reg ) = ∑  i i ( SiY2 − β 02i SiX2 ) 
i =1  ni 
N i − ni
where f i = .
Ni
Since SRSWOR is followed in drawing the samples from each stratum, so
E ( six2 ) = SiX2
E ( siy2 ) = SiY2
E ( sixy ) = SiXY .
Thus an unbiased estimator of variance can be obtained by replacing SiX2 and SiY2 by their respective unbiased
2 2
estimators six and siy respectively as
 (Yˆ ) =  wi fi ( s 2 + β 2 s 2 − 2 β s ) 
k 2
Var s reg ∑ 
i =1  ni
iy oi ix 0 i ixy 

and
 min (Yˆ ) =  wi fi ( s 2 − β 2 s 2 ) .
k 2
Var s reg ∑ 
i =1  ni
iy oi ix 

5
2. Combined regression estimator:
Assume β is known, say β 0 . Then

k k
Yˆc reg = ∑ wi yi + β 0 ( X − ∑ wi xi )
i =1 i =1
( )
k k
E Yˆc reg = ∑ wi E ( yi ) + β 0 [ X − ∑ wi E ( xi )]
i =1 i =1
k k
= ∑ wiYi + β 0 [ X − ∑ wi X i ]
i =1 i =1
= Y + β0 ( X − X )
=Y.
Thus Yˆc reg is an unbiased estimator of Y .
Var (Yˆc reg ) = E[Yc reg − E (Yc reg )]2

2
 k k

= E  ∑ wi yi + β 0 ( X − ∑ wi xi ) − Y 
 i =1 i =1 
2
 k k

= E  ∑ wi ( yi − Y ) − β 0 ∑ wi ( xi − X i ) 
 i =1 i =1 
k k k
= ∑ wi2Var ( yi ) + β o2 ∑ wi2Var ( xi ) − 2∑ wi2 β 0Cov ( xi , yi )
i =1 i =1 i =1
k
wi2 f i 2
=∑  SiY + β o2 SiX2 − 2 β 0 SiXY  .
i =1 ni 
6
Var (Yˆc reg ) is minimum when
Cov( xst , yst )

β0 =
Var ( xst )
k
wi2 fi
∑i =1 ni
SiXY
= k 2
wi fi 2
∑i =1 ni
SiX
and the minimum variance is given by
k 2
w f
Varmin (Yˆc reg ) = ∑ i i ( SiY2 − β 02 SiX2 ).
i =1 ni
Since SRSWOR is followed to draw the sample from strata, so using
E ( six2 ) = SiX2 , E ( siy2 ) = SiY2 and E ( sixy ) = SiXY ,

we get the estimate of variance as
 (Yˆ ) =  wi fi ( s 2 + β 2 s 2 − 2 β s ) 
k 2
Var c reg ∑ 
i =1  ni
iy o ix 0 i ixy 

and
 min (Yˆ ) =  wi fi ( s 2 − β 2 s 2 ) .
k 2
Var c reg ∑ 
i =1  ni
iy oi ix 

7
Comparison of Yˆs reg and Yˆc reg :
Note that
k 2
w f
Var (Yˆc reg ) − Var (Yˆs reg ) = ∑ ( β io2 − β 02 ) i i SiX2
i =1 ni
k
fi
=∑ ( β io − β 0 ) 2 wi2 SiX2
i =1 ni
≥0
which is always true.
So if regression line of y on x is approximately linear and the regression coefficients do not vary much among
strata, then separate regression estimate is more efficient than combined regression estimator.
8
Sampling Theory
MODULE VII
LECTURE - 23
VARYING PROBABILITY
SAMPLING
DR. SHALABH
1
The simple random sampling scheme provides a random sample where every unit in the population has equal probability
of selection. Under certain circumstances, more efficient estimators are obtained by assigning unequal probabilities of
selection to the units in the population. This type of sampling is known as varying probability sampling scheme.
If y is the variable under study and x is an auxiliary variable related to y, then in the most commonly used varying
probability scheme, the units are selected with probability proportional to the value of x, called as size. This is termed as
probability proportional to a given measure of size (pps) sampling. If the sampling units vary considerably in size,
then SRS does not take into account the possible importance of the larger units in the population. A large unit, i.e., a unit
with large value of y contributes more to the population total than smaller units, so it is natural to expect that a selection
scheme which assigns more probability of inclusion in a sample to larger units than to smaller units would provide more
efficient estimators than the estimators based on equal probability scheme. This is accomplished through pps sampling.
Note that the size considered is the value of auxiliary variable x and not the value of study variable y. For example in an
agriculture survey, the yield depends on the area under cultivation. So bigger areas are likely to have larger population and
they will contribute more towards the population total, so the value of the area can be considered as the size of auxiliary
variable. Also, the cultivated area for a previous period can also be taken as the size while estimating the yield of crop.
Similarly, in an industrial survey, the number of workers in a factory can be considered as the measure of size while
studying the industrial output from the respective factory.

2
Difference between the methods of SRS and varying probability scheme:
In SRS, the probability of drawing a specified unit at any given draw is the same. In varying probability scheme, the
probability of drawing a specified unit differs from draw to draw.
It appears in pps sampling that such procedure would give biased estimators as the larger units are over-represented
and the smaller units are under-represented in the sample. This will happen in case of sample mean as an estimator
of population mean where all the units are given equal weight. Instead of giving equal weights to all the units, if the
sample observations are suitably weighted at the estimation stage by taking the probabilities of selection into account,
then it is possible to obtain unbiased estimators.
In pps sampling, there are two possibilities to draw the sample, i.e., with replacement and without replacement.
Selection of units with replacement:
The probability of selection of a unit will not change and the probability of selecting a specified unit is same at any
stage. There is no redistribution of the probabilities after a draw.
Selection of units without replacement:

The probability of selection of a unit will change at any stage and the probabilities are redistributed after each draw.
PPS+WOR is more complex than PPS + WR . We consider both the cases separately.
3
PPS sampling with replacement (WR):
First we discuss the two methods to draw a sample with PPS and WR.
1. Cumulative total method:
The procedure of selection of a simple random sample of size n consists of
 associating the natural numbers from 1 to N units in the population and
 then selecting those n units whose serial numbers correspond to a set of n numbers where each number is
less than or equal to N drawn from a random number table.
In selection of a sample with varying probabilities, the procedure is to associate with each unit a set of consecutive
natural numbers, the size of the set being proportional to desired probability.
If x1 , x2 ,..., xN are the positive integers proportional to the probabilities assigned to the N units in the population, then
a possible way to associate the cumulative totals of the units. Then the units are selected based on the values of
cumulative totals. This is illustrated in the following table:

4
Units Size Cumulative sizes
1 X1 T1 = X 1
Select a random • If Ti −1 ≤ R ≤ Ti , then ith

2 X2 T2 = X 1 + X 2
number R between unit is selected with
Xi
   1 and TN by using
random number
probability T ,
i = 1,2,…, N.
N
i −1
i −1 X i −1 Ti −1 = ∑ X j table.
j =1
i • Repeat the procedure

i Xi Ti = ∑ X j
j =1 n times to get a
  
sample of size n.
N N
N X N = ∑ Xi TN = ∑ X i
j =1 j =1
5
In this case, the probability of selection of ith unit is
Ti − Ti −1 X i
Pi = =
TN TN
⇒ Pi ∝ X i .
Note that TN is the population total which remains constant.
Drawback:
This procedure involves writing down the successive cumulative totals. This is time consuming and tedious if the
number of units in the population is large.
This problem is overcome in the Lahiri’s method.
6
Lahiri’s method:
Let M = Max X i , i.e., maximum of the sizes of N units in the population or some convenient number greater than
i =1,2,..., N
M.
The sampling procedure has following steps:
1. Select a pair of random number (i, j) such that 1 ≤ i ≤ N , 1 ≤ j ≤ M .
2. If j ≤ X i , then ith unit is selected otherwise rejected and another pair of random number is chosen.
3. To get a sample of size n, this procedure is repeated till n units are selected.
Now we see how this method ensures that the probabilities of selection of units are varying and are proportional to
size.
Probability of selection of ith unit at a trial depends on two possible outcomes
• either it is selected at the first draw
• or in subsequent draws preceded by ineffective draws. Such probability is given by
P(1 ≤ i ≤ N ) P(1 ≤ j ≤ M / i )
1 Xi
= . = Pi * , say.
N M
N
1  Xi 
Probability that no unit is selected at a trial =
N
∑ 1 − M 
i =1
1  NX 
= N − M 
N  
X
= 1− = Q , say.
7
M
Probability that unit i is selected at a draw (all other previous draws result is non selection of unit i)
= Pi * + QPi * + Q 2 Pi * + ...
Pi *
=
1− Q
X i / NM X Xi
= = i = ∝ Xi.
X /M NX X total
Thus the probability of selection of unit i is proportional to the size Xi. So this method generates a pps sample.
Advantages:
1. It does not require writing down all cumulative totals for each unit.
2. Sizes of all the units need not be known before hand. We need only some number greater than the maximum size
and the sizes of those units which are selected by the choice of the first set of random numbers 1 to N for drawing
sample under this scheme.
Disadvantages:
It results in wastage of time and efforts if units get rejected.

X
The probability of rejection = 1 − .
M
M
The expected numbers of draws required to draw one unit = .
X
This number is large if M is much larger than X .
8
Example: Consider following data set of 10 number of workers in the factory and its output.
Factory Number of Industrial Cumulative total of sizes

no. workers production
(X) (in metric tons) (Y)
(in thousands)
1 2 30 T1 = 2
2 5 60 T2 = 2 + 5 = 7
3 10 12 T3 = 2 + 5 + 10 = 17
4 4 6 T4 = 17 + 4 = 21
5 7 8 T5 = 21 + 7 = 28
6 12 13 T6 = 28 + 12 = 30
7 3 4 T7 = 30 + 3 = 33
8 14 17 T8 = 33 + 14 = 47
9 11 13 T9 = 47 + 11 = 58
10 6 8 T10 = 58 + 6 = 64 9
Selection of sample using cumulative total method:
1. First Draw: Draw a random number between 1 and 64.
- Suppose it is 23
- T4 < 23 < T5
- Unit Y is selected and Y5 = 8 enters in the sample..
2. Second Draw:
• Draw a random number between 1 and 64
• Suppose it is 38
• T7 < 38 < T8
• Unit 8 is selected and Y8 = 17 enters in the sample and so on.
• This procedure is repeated till the sample of required size is obtained.
Selection of sample using Lahiri’s Method

In this case
M = Max X i = 14
i =1,2,...,10
So we need to select a pair of random number (i, j) such that 1 ≤ i ≤ 10, 1 ≤ j ≤ 14.
Following table shows the sample obtained by Lahiri’s scheme:
10
Random Random Observation Selection of
number number unit
1 ≤ i ≤ 10 1 ≤ j ≤ 14
3 7 j = 7 < X 3 = 10 trial selected ( y3 )
6 13 j = 13 > X 6 = 12 trial rejected
4 7 j = 7 > X4 = 4 trial rejected
2 9 j = 9 > X2 = 5 trial rejected
9 2 j = 2 < X 9 = 11 trial accepted ( y9 )
and so on. Here ( y3 , y9 ) are selected into sample.
11
Sampling Theory
MODULE VII
LECTURE - 24
VARYING PROBABILITY
SAMPLING
DR. SHALABH
1
Varying probability scheme with replacement:
Let
y : variable under study
Yi : value of study variable y for the ith unit of the population, i = 1,2,…,N.
xi : value of auxiliary variable x for the ith unit.
Pi : probability of selection of ith unit in the population at any given draw.
Define
yi
zi = , i = 1, 2,..., N .
NPi
Under varying probability scheme and with replacement, for a sample of size n,
1 n
z= ∑ zi
n i =1
2
σ z2 N
 Y 
is an unbiased estimator of population mean Y , variance of z is where σ = ∑ Pi  i − Y  and an
2
z
n i =1 NP
 i 
unbiased estimate of variance of z is
sz2 1 n
= ∑ ( zi − z )2 .
n n − 1 i =1
2
Proof:
1 n  yi 
E(z ) = ∑E 
n i =1  NPi 
1 n  Y1 Y Y 
= ∑ 
n i =1  NP1
P1 + 2 P2 + ... + N PN 
NP2 NPN 
1 n
= ∑Y
n i =1
=Y.
The variance of z is
n
1
Var ( z ) = 2
Var (∑ zi )
n i =1
n
1
=
n2
∑Var ( z )
i =1
i ( zi' s are independent in WR case)
n
1
∑ E [ z − E ( z )]
2
= i i
n2 i =1
n
1
=
n2
∑ E(z − Y )
i =1
i
2
1 n  Y 
2
 Y 
2
 Y 
2

=
n2
∑  NP − Y 
1
P1 +  2 − Y )  P2 + ... +  N − Y  PN 
i =1
 1   NP2   NPN  
1 N  Yi  
2
= ∑  − Y  Pi 
n i =1  NPi  

σ z2
= .
3
n
sz2 is an unbiased estimator of variance of
To show that z , consider
n
 n

(n − 1) E ( sz2 ) = E  ∑ ( zi − z ) 2 
 i =1 
 n 
= E  ∑ zi2 − nz 2 
 i =1 
 n 
=  ∑ E ( zi2 ) − nE ( z ) 2 
 i =1 
n
= ∑ Var ( zi ) + { E ( zi )}  − n Var ( z ) + { E ( z )} 
2 2
i =1
   
 2

( )  Y 
n N
= ∑ σ z2 + Y 2 ) − n( σnz + Y 2  using Var ( zi ) = ∑  i − Y  Pi = σ z2 
2
 i =1  NPi  
i =1
 
= (n − 1)σ z2
E ( sz2 ) = σ z2
 s2  σ 2
or E  z  = z = Var ( z )
n n
1  n  yi  
2
sz2

⇒ Var ( z ) = = ∑   − nz  .
2
n n(n − 1)  i =1  NPi  

1
Note: If Pi = , then z = y ,
N  
2
1 1  Yi
N  σ y2
Var ( z ) = ∑ 
n N i =1  N . 1
−Y  =
n
which is the same as in case of SRSWR.

4
 N 
Estimation of population total:
An estimate of population total is
1 n y 
Yˆtot = ∑  i  = N z .
n i =1  Pi 
Taking expectation, we get
1 n Y Y Y 
E (Yˆtot ) = ∑  1 P1 + 2 P2 + ... + N PN 
n i =1  P1 P2 PN 
N
= ∑ Yi = Ytot .
i =1
Thus Yˆtot is an unbiased estimator of population total.
The variance of Yˆtot is

Var (Yˆtot ) = N 2Var ( z )
2
1 N 1 Y 
= N ∑ 2  i − NY  Pi
2
n i =1 N  Pi 
2
1 N Y 
= ∑  i − Ytot  Pi
n i =1  Pi 
1  N Yi 2 
=  ∑
n  i =1 Pi 2
− Ytot2  .

2
An estimate of the variance of  (Yˆ ) = N sz .
Yˆtot is Var tot
n
5
Varying probability scheme without replacement
In varying probability scheme without replacement, when initial probabilities of selection are unequal, the probability
of drawing a specified unit of the population at a given draw changes with the draw. Generally, the sampling WOR
provides a more efficient estimator than sampling WR. The estimators for population mean and variance are more
complicated. So this scheme is not commonly used in practice, especially in large scale sample surveys with small
sampling fractions.
Let Ui : ith unit,

Pi : Probability of selection of Ui at first draw, i = 1,2,…N
N
∑ P =1
i =1
i
Pi ( r ) : Probability of selecting Ui at rth draw

Pi (1) = Pi .
6
Consider
Pi (2) = Probability of selection of Ui at 2nd draw.
Such an event can occur in following possible ways:
- U1 is selected at 1st draw and U i is selected at 2nd draw

- U 2 is selected at 1st draw and U i is selected at 2nd draw

- U i -1 is selected at 1st draw and U i is selected at 2nd draw
- U i +1 is selected at 1st draw and U i is selected at 2nd draw

- U N is selected at 1st draw and U i is selected at 2nd draw.
So Pi (2) can be expressed as
Pi P Pi Pi Pi
Pi (2) = P1 + P2 i + ... + Pi −1 + Pi +1 + ... + PN
1 − P1 1 − P2 1 − Pi −1 1 + Pi +1 1 − PN
N
Pi
= ∑
j ( ≠ i ) =1
Pj
1 − Pj
N
Pi P P
= ∑
j ( ≠ i ) =1
Pj
1 − Pj
+ Pi i − Pi i
1 − Pi 1 − Pi
N
Pi P
= ∑ Pj − Pi i
j =1 1 − Pj 1 − Pi
N P P 
= Pi  ∑ j − i 
 j =1 1 − Pj 1 − Pi 
7
1
Pi (2) ≠ Pi (1) for all i unless Pi = .
N
y 
Pi (2) will in general be different for each i = 1,2,…,N. So E  i  will change with successive draws. This makes
P
y1  i 
the varying probability scheme WOR more complex. Only will provide an unbiased estimator of Y . In general,
NP1
yi
(i ≠ 1) will not provide an unbiased estimator of Y .
NPi
Ordered estimates
To overcome the difficulty of changing expectation with each draw, associate a new variate with each draw such that its
expectation is equal to the population value of the variate under study. Such estimators take into account the order of
the draw. They are called ordered estimates. The order of the value obtained at previous draw will affect the
unbiasedness of population mean.
We consider the ordered estimators proposed by Des Raj, first for the case of two draws and then generalize the
result.
8
Des Raj ordered estimator
Case 1: Case of two draws:
Let y1 and y2 denote the values of units U1 and U2 drawn at first and second draw respectively. Note that y1 and y2
are not the values of first two units in the population. Further, let P1 and P2 denote the initial probabilities of selection of
U1 and U2 respectively.
Consider the estimators

y1
z1 =
NP1
1  y2 
z2 =  y1 + 
N  P2 / (1 − P1 ) 
1  (1 − P1 ) 
=  y1 + y2 
N  P2 
z1 + z2
z= .
2
P2
Note that is the probability P (U 2 | U1 ).
1 − P1
9
Estimation of population mean:
First we show that z is an unbiased estimator of Y .
E ( z ) =.Y
N
Note that ∑ P = 1.
i =1
i
Consider
1  y1 
E ( z1 ) = E 
N  P1 
1  Y1 Y2 YN 
=  P1 + P2 + ... + PN 
N  P1 P2 PN 
=Y
1  (1 − P1 ) 
E ( z2 ) = E  y1 + y2 
N  P2 
1    (1 − P1 )  
=  E ( y1 ) + E  E  y2 U1   (Using E (Y ) = E X [ EY (Y / X )].
N    P2  
P2
Since y2 can take all possible values except y1 with probability .
1 − P1
So  (1 − P1 )   y j (1 − P1 ) Pj 
E  y2 U1  = ∑  . 
 P2  
j  Pj 1 − P1 
where the summation is taken over all the values of y except the value y1 which is selected at the first stage. So
 (1 − P1 ) 
10
E  y2 U1  = Ytot − y1
 P2 
Substituting it in E ( z2 ), we have
1
E ( z2 ) = [ E ( y1 ) + E (Ytot − y1 )]
N
1
= E (Ytot )
N
Ytot
=
N
=Y.
E ( z1 ) + E ( z2 )
Thus E ( z ) =
2
Y +Y
=
2
=Y.
11
Sampling Theory
MODULE VII
LECTURE - 25
VARYING PROBABILITY
SAMPLING
DR. SHALABH
1
Variance:
The variance of z for the case of two draws is given as
 1  
2 2
 1 N N
 Yi 1 N
Y 
Var ( z ) = 1 − ∑ Pi 2   2 ∑ Pi  − Ytot   − 2 ∑ Pi  i − Ytot  .
2
 2 i =1   2 N i =1  Pi   4 N i =1  Pi 
Proof: Before starting the proof, we note the following property

N N  N 
∑ a b = ∑ a ∑ b
i j i j − bi 
i ≠ j =1 i =1  j =1 
which is used in the proof.
The variance of z is
Var ( z ) = E ( z 2 ) − [ E ( z ) ]
2
2
 1  y1 y2 (1 − P1 ) 
= E  + y1 +  − Y
2
 2 N  P1 P2 
2
1  y (1 + P1 ) y2 (1 − P1 ) 
= 2
E 1 +  −Y
2
4N  P1 P2 
↓ ↓
nature of nature of
variable variable
depends depends
only on upon1st and
1st draw 2nd draw
2
1  N  Yi (1 + Pi ) Y j (1 − Pi )  PP 
2
= ∑ 
4 N 2  i ≠ j =1  Pi
+ 
i j
−
 −Y 2
Pj  1 P i 
 
1  N  Y 2 (1 + P ) 2 PP Y j2 (1 − Pi ) 2 PP (1 − Pi 2 ) PP
i j 

∑ 
i j i j
= i i
+ + 2YY
i j  − Y
2
4N 2  i ≠ j =1  Pi
2
1 − P i P j
2
1 − P i PPi j 1 − P 
i 
1  N  Y 2 (1 + P ) 2 Pj Y 2 (1 − Pi ) 2 Pi  
∑  i + j + 2YY
i j (1 + Pi )   −Y .
2
= i
4N 2  i ≠ j =1  Pi 1 − Pi Pj 1 − Pi  
Using the property

N N N 
∑ a b
i j = ∑ ai  ∑ b j − bi  , we can write
i ≠ j =1 i =1  j =1 
1  N Yi 2 (1 + Pi ) 2  N  N  N Y j2 Yi 2  N N
Var ( z ) = ∑ ∑ j i  ∑ i
P − P + P (1 − Pi ∑
) −  + 2∑ Yi (1 + Pi )(∑ Y j − Yi )] − Y 2
4N 2  i =1 Pi (1 − Pi )  j =1  i =1  j =1 Pj Pi  i =1 j =1
1  N Y2 N  N Y j2 Y 2  N N 
=  ∑ i (1 + Pi + 2 Pi ) + ∑ Pi (1 − Pi ) ∑ − i  + 2∑ Yi (1 + Pi )(∑ Y j − Yi )  − Y
2 2
4N 2 P
 i =1 i i =1 P
 j =1 j P i 
 i =1 j =1 
1  N Yi 2 N 2 N N N Y2 N N N
Yi 2
=  ∑ ∑i i ∑
+
4 N 2  i =1 Pi i =1
Y P + 2
i =1
Yi
2
+ ∑
i =1
Pi∑
j
j =1 Pj
− ∑
i =1
Yi
2
− ∑i =1 i ∑
P 2
j =1 Pj
N N N N N N
+ ∑ PY
i i + 2∑ Yi ∑ Y j − 2∑ Yi Pi + 2∑ Yi Pi ∑ Y j − 2∑ Yi ] − Y
2 2 2 2
i i =1 j =1 i =1 i =1 j =1 i =1
1  N Yi 2 N 2 N Y j 
2 N N
= 2  2 ∑ − ∑
4 N  i =1 Pi i =1
Pi ∑
j =1 Pj
− ∑ Yi
2
+ 2Y 2
tot + 2Ytot ∑ Yi Pi  − Y
2
3
i =1 i =1 
 1 N  1  N Yi 2 2 2  1 N 2 N
2 2
= 2 1 − ∑ Pi 2  2 ∑
−Ytot + Ytot  − 2 ∑ i
Y − 2Y 2
tot − 2Ytot ∑ Yi Pi + 4 N Y 
 2 i =1  4 N  i =1 Pi  4 N  i =1 i =1 
2
 1 N  1 N
Y  1 N N
 1 N  1
= 1 − ∑ Pi 2  2 ∑ Pi  i − Ytot  − ( Y 2 − 2Ytot ∑ Yi Pi − 2Ytot2 + 4Ytot2 ) + 1 − ∑ Pi 2 
2 ∑ i
Y2
2 tot
 2 i =1  2 N i =1  Pi  4 N i =1 i =1  2 i =1  2 N
2
 1 N  1 N
 Yi  1 N N
= 1 − ∑ Pi 2  2 ∑ Pi  − Ytot  − 2 ∑ i
( Y 2
− 2Ytot ∑ Yi Pi + 2Ytot − 2Ytot + ∑ Pi Ytot )
2 2 2 2
 2 i =1  2 N i =1 P
 i  4 N i =1 i =1 i
2
 1 N  1 N
Y  1 N
= 1 − ∑ Pi 2 
 2  2N
2 ∑ Pi  i − Ytot  −
P 4 N 2 ∑ (Y i
2
− 2YtotYi Pi + Pi 2Ytot2 )
i =1 i =1  i  i =1
2 2
1  1 N 2  N  Yi  1 N
Y 
= 2 
1 − ∑ Pi  ∑ Pi  − Ytot  − 2 ∑ Pi  i − Ytot 
2
2 N  2 i =1  i =1  Pi  4N i =1  Pi 
2 2 2
1 N  Yi  1 N N
Y  1 N
Y 
= ∑ Pi 
2 i =1  NPi
−Y  −
4 N 2 ∑ Pi 2 ∑  i − Ytot  −
P 4 N 2 ∑ Pi 2  i − Ytot  .
 i =1 i =1  i  i =1  Pi 
↓ ↓
variance of WR reduction of variance

case for n = 2 in WR with varying probability 4
Estimation of Var ( z )
Var ( z ) = E ( z 2 ) − ( E ( z )) 2
= E(z 2 ) − Y 2 .
Since
E ( z1 z2 ) = E [ z1 E ( z2 / u1 ) ]
= E  z1Y 
= YE ( z1 )
= Y 2.
Consider
E  z 2 − z1 z2  = E ( z 2 ) − E ( z1 z2 )
= E(z 2 ) − Y 2
= Var ( z )
 ( z ) = z 2 − z z is an unbiased estimator of Var ( z )
⇒ Var 1 2
5
Alternative form of estimate of Var ( z )
(z ) = z 2 − z z
Var 1 2
2
z +z 
=  1 2  − z1 z2
 2 
( z1 − z2 ) 2
=
4
2
1  y1 y1 y2 1 − P1 
=  NP − N − N P 
4  1 2 
2
1  y1 y2 (1 − P1 ) 
= (1 − P1 ) P − 
4N 2  1 P2 
2
(1 − P1 ) 2  y1 y2 
=  −  .
4 N 2  P1 P2 
6
Case 2: General case
Let (u1 , u2 ,..., ur ,..., un ) be the units selected in the order in which they are drawn in n draws and let ( y1 , y2 ,.., yr ,..., yn )
and ( P1 , P2 ,..., Pr ,..., Pn ) be the corresponding y-values and initial probabilities of selection.
Further, let y1
Z1 =
NP1
1  yr 
Zr =  y1 + y2 + ... + yr −1 + (1 − P1 − ... − Pr −1 )  for r = 2,3,..., n.
N  Pr 
1 n
Consider z = ∑ zr asan estimator of population mean.
n r =1
Then
1 n
E(z ) = ∑ E ( zr )
n r =1
1 n
= ∑Y
n r =1
=Y .
Thus z is an unbiased estimator for population mean Y .
7
Estimation of variance
The estimate of Var ( z ) is

(z ) = z 2 − z z
Var 1 2
. which is an unbiased estimate or Var ( z ).
Proof : Var ( z ) = E ( z 2 ) − [ E ( z ) ]
2
= E ( z )2 − Y 2 .
Consider
E ( z1 z2 ) = E [ z1 E ( z2 / u1 ) ]
= E  z1Y 
= YE ( z1 )
= Y 2.
Substituting Y 2 = E ( z1 z2 ) in Var ( z ), we have
Var ( z ) = E ( z ) 2 − Y 2
= E ( z 2 ) − E ( z1 z2 )
= E ( z 2 − z1 z2 ).
Thus
(z ) = z 2 − z z
Var 1 2
is an unbiased estimator of Var ( z ) .

8
This can be further simplified as
(z ) = z 2 − z z
Var 1 2
2
z +z 
=  1 2  − z1 z2
 2 
2
z −z 
= 1 2 
 2 
2
1 y y y 1 − P1 
=  1 − 1− 2 
4  NP1 N N P2 
2
1  y y (1 − P1 ) 
= 2 
(1 − P1 ) 1 − 2 
4N  P1 P2 
2
(1 − P1 ) 2  y1 y2 
=  −  .
4 N 2  P1 P2 
9
Case 2: General case for n draws:
Let y1 , y2 ,..., yn be the values of the units in the order in which they are drawn and P1 , P2 ,..., Pn be thus respective
initial probabilities. Define
y1
z1 =
NP1
1  yr 
zr =  y1 + y2 + ... + yr −1 + (1 − P1 − P2 − ... − Pr −1 )  ; r = 2,3,..., n
N  Pr 
1 n
z= ∑ zi .
n i =1
The estimator z is an unbiased estimator of Y .
E ( z1 ) = Y (already shown earlier)
1
E ( zr ) = E [ ( zr | y1 , y2 ,..., yr −1 )]
N
yi
Note that takes all values except those which were selected in the previous draws with corresponding probabilities
Pi
Pr .
1 − P1 − P2 − ... − Pr −1
10
yr
E ( zr | y1 , y2 ,..., yr −1 ) = E[ y1 + y2 + ... + yr −1 + (1 − P1 − ... − Pr −1 )]
Pr
Pr (1 − P1 − ... − Pr −1 ) E[Ytot − ( y1 + y2 + ... + yr −1 )]
= E[( y1 + y2 + ... + yr −1 ) + .
Pr (1 − P1 − ... − Pr −1 )
= E (Ytot )
= Ytot
1
E ( zr ) = E (Ytot )
N
=Y
1 n
E(z ) = ∑ E ( zi )
n i =1
1
= nY
n
=Y.
The expression for variance of z in general case is complex but its estimate is simple.
11
Estimate of variance:
Var ( z ) = E ( z 2 ) − Y 2
Consider for r < s,

E ( zr zs ) = E [ zr E ( zs / y1 , y2 ,..., ys −1 ) ]
= E  zrY 
= YE ( zr )
=Y2
because for r < s, zr will not contribute
and similarly for s < r , zs will not contribute in the expectation.
Further, for s < r,
E ( zr zs ) = E [ zs E ( zr / y1 , y2 ,..., yr −1 ) ]
= E  zsY 
= YE ( zs )
= Y 2.
Consider,
 1 n n  1 n n
E ∑ ∑ zr z s  = ∑ ∑ E ( zr z s )
 n(n − 1) r ( ≠ s ) =1 s =1  n(n − 1) r ( ≠ s ) =1 s =1
1
= n(n − 1)Y 2
n(n − 1)
12
= Y 2.
Substituting Y 2 in Var ( z ), we get
2
Var ( z ) = E ( z 2 ) − Y
 1 n n

= E(z 2 ) − E  ∑ ∑ E ( zr zs ) 
 n( n − 1) r ( ≠ s )=1 s =1 
n n
1
(z ) = z 2 −
⇒Var ∑ ∑
n( n − 1) r ( ≠ s )=1 s =1
zr zs .
2
 n  n n n
Using  ∑ zr  = ∑ zr2 + ∑ ∑ zr zs
 r =1  r =1 r ( ≠ s ) =1 s =1
n n n
⇒ ∑ ∑z z
r ( ≠ s ) =1 s =1
r s = n 2 z 2 − ∑ zr2 ,
r =1
 ( z ) can be further simplified as

the expression of Var
1  2 2 n 2
 (z ) = z 2 −
Var n z − ∑ zr 
n(n − 1)  r =1 
1  n 2 
=  ∑
n(n − 1)  r =1
zr − nz 2 

n
1
= ∑
n(n − 1) r =1
( zr − z ) 2 . 13
Sampling Theory
MODULE VII
LECTURE - 26
VARYING PROBABILITY
SAMPLING
DR. SHALABH
1
Unordered estimator:
In ordered estimator, the order in which the units are drawn is considered. Corresponding to any ordered estimator,
there exist an unordered estimator which does not depend on the order in which the units are drawn and has
smaller variance than the ordered estimator.
N
In case of sampling WOR from a population of size N, there are   unordered sample(s) of size n.
n
Corresponding to any unordered sample(s) of size n units, there are n! ordered samples.
For example, for n = 2 if the units are u1 and u2 , then

• there are 2! ordered samples - (u1 , u2 ) and (u2 , u1 )
• there is one unordered sample (u1 , u2 ).
Moreover,
 Probability of unordered   Probability of ordered   Probability of ordered 
  = + .
 sample (u1 , u2 )   sample (u1 , u 2 )   sample (u2 , u 1 ) 
For n = 3, there are three units u1 , u2 , u3 and

- there are following 3! = 6 ordered samples -
(u1 , u2 , u3 ), (u1 , u3 , u2 ), (u2 , u1 , u3 ), (u2 , u3 , u1 ), (u3 , u1 , u2 ), (u3 , u2 , u1 ).
- there is one unordered sample (u1 , u2 , u3 ).

Moreover,
Probability of unorderedsample
= Sum of probability of ordered sample, i.e.
P (u1 , u2 , u3 ) + P (u1 , u3 , u2 ) + P (u2 , u1 , u3 ) + P (u2 , u3 , u1 ) + P(u3 , u1 , u2 ) + P(u3 , u2 , u1 ).
2
N
Let zsi , s = 1, 2,..,   , i = 1, 2,..., n !(= M ) be an estimator of population parameter θ based on ordered sample si
n
Consider a scheme of selection in which the probability of selecting the ordered sample ( si ) is psi .
The probability of getting the unordered sample(s) is the sum of the probabilities, i.e.,
M
ps = ∑ psi .
i =1
For a population of size N with units denoted 1, 2,…, , the samples of size n are n - tuples. In the nth draw, the
sample space will consist of N ( N − 1)...( N − n + 1) unordered sample points.
1
psio = P [ selection of any ordered sample ] =
N ( N − 1)...( N − n + 1)
n!
psiu = P [ selection of any unordered sample ] = = n ! P [ selection of any ordered sample ]
N ( N − 1)...( N − n + 1)
then
M ( = n !)
n !( N − n )! 1
ps = ∑
i =1
psio =
N!
=
N
.
n
  3
Theorem:
N M
θˆ0 = zsi , s = 1, 2,...,   ; i = 1, 2,..., M ( = n !) and θû = ∑ zsi psi' are the ordered and unordered
n i =1
estimators of θ , then
(i ) E (θû ) = E (θˆ0 )
(ii ) Var (θû ) ≤ Var (θˆ0 )
where zsi is a function of si th ordered sample (hence a random variable) and psi
is the probability of selection of si th
psi
ordered sample and psi =
'
.
ps
N
Proof: Total number of ordered sample = n ! n 
 
N
 
n M
(i ) E (θˆ0 ) = ∑∑ zsi psi
s =1 i =1
N
 
n
M 
E (θû ) = ∑  ∑ zsi psi'  ps
s =1  i =1 
 p 
= ∑  ∑ zsi si  ps
s  i ps 
= ∑∑ zsi psi
s i
= E (θˆ0 )
4
N
(ii) Since θˆ0 = zsi , so θˆ02 = zsi2 with probability psi , i = 1, 2,..., M , s = 1, 2,...,  
2 n  
ˆ 2 =  z p '  with probability
M M
Similarly, θû = ∑z
i =1
p '
si si , so θ u  ∑
 i =1
si si 

ps
Consider
2
Var (θˆ0 ) = E (θˆ02 ) −  E (θˆ0 ) 
2
= ∑∑ zsi2 psi −  E (θˆ0 ) 
s i
2
Var (θû ) = E (θû2 ) −  E (θû ) 
2
  2
= ∑  ∑ zsi psi'  ps −  E (θˆ0 ) 
s  i 
2
 
Var (θˆ0 ) − Var (θû ) = ∑∑ zsi2 psi − ∑  ∑ zsi psi'  ps
s i s  i 
2
    
= ∑∑ zsi2 psi + ∑  ∑ zsi psi'  ps − 2∑  ∑ zsi psi'   ∑ zsi psi'  ps
s i s  i  s  i  i 
 
2
      
= ∑  ∑ zsi2 psi +  ∑ zsi psi'   ∑ psi  − 2  ∑ zsi psi'   ∑ zsi psi  ps 
 i
s   i   i   i  i  
  
2

'   
= ∑  ∑  zsi psi +  ∑ zsi psi  psi − 2  ∑ zsi psi'  zsi psi 
2
s  i   i   i  
 
 
= ∑∑ ( zsi − ∑ zsi psi' ) 2 psi  ≥ 0
s i  i 
⇒ V (θˆ0 ) − V (θû ) ≥ 0
or V (θû ) ≤ V (θˆ0 )
5
Estimate of Var (θû )
Since
 
Var (θˆ0 ) − Var (θû ) = ∑∑ ( zsi − ∑ zsi psi' ) 2 psi 
s i  i 
 
 (θˆ ) = Var
Var u
 (θˆ ) −
0 ∑∑
s
( zsi − ∑ zsi psi ) psi 
i  i
' 2

' 
=∑i
'  ˆ
0 ∑
p Var (θ ) − p ( z − z p ) .
si
i
si si ∑
i
si
' 2
si
Based on this result, now we use the ordered estimators to construct an unordered estimator. It follows from this
theorem that the unordered estimator will be more efficient than the corresponding ordered estimators.
6
Murthy’s unordered estimator corresponding to Des Raj’s ordered
estimator for the sample size n
Suppose yi and y j are the values of units ui and u j selected in first and second draws respectively with varying
probability and WOR in a sample of size 2 and let Pi and Pj be the corresponding initial probabilities of selection.
So now we have two ordered estimates corresponding to the ordered samples
s1* = ( yi , y j ) with (ui , u j )

s2* = ( y j , yi ) with (u j , ui )
which are given as
1  yi yj 
z ( s1* ) = (1 + Pi ) + (1 − Pi ) 
2N  Pi Pj 
where the corresponding Des Raj estimator is given by
1  yi y j (1 − Pi ) 
 yi + + 
2N  Pi Pj 
and
1  yj yi 
z ( s2* ) = (1 + Pj ) + (1 − Pj ) 
2N  Pj Pi 
where the corresponding Des Raj estimator is given by
1  y j yi (1 − Pj ) 
yj + + .
2N  Pj Pi 
7
*
The probabilities corresponding to z ( s1 ) and z ( s2 ) are
*
PP
p( s1* ) = i j
1 − Pi
Pj Pi
p( s2* ) =
1 − Pj
p( s ) = p ( s1* ) + p ( s2* )
i j (2 − Pi − Pj )
PP
=
(1 − Pi )(1 − Pj )
1 − Pj
p '( s1* ) =
2 − Pi − Pj
1 − Pi
p '( s2* ) = .
2 − Pi − Pj
8
Murthy’s unordered estimate z (u ) corresponding to the Des Raj’s ordered estimate is given as
z (u ) = z ( s1* ) p '( s1 ) + z ( s2* ) p '( s2 )

z ( s1* ) p ( s1* ) + z ( s2* ) p ( s2* )
=
p ( s1* ) + p ( s2* )
 1  yi y j   PP
i j 
  1  yj yi  Pj Pi 
 (1 + Pi ) + (1 − Pi )    +  (1 + Pj ) + (1 − Pj )    
 2N  Pi Pj   1 − Pi    2 N  Pj Pi  1 − Pj  
=
PP PP
i j
+ j i
1 − Pi 1 − Pj
1   y y j   yj y 
 (1 + Pi ) i + (1 − Pi )  (1 − Pj ) + (1 + Pj ) + (1 − Pj ) i  (1 − Pi ) 
2N   Pi Pj   Pi Pi  
=
(1 − Pj ) + (1 − Pi )
1  yi yj 
(1 − Pj ) {(1 + Pi ) + (1 − Pi )} + (1 − Pi ) {(1 − Pj ) + (1 + Pj }
2N  Pi Pj 
=
2 − Pi − Pj
yi y
(1 − Pj ) + (1 − Pi ) j
Pi Pj
= .
N (2 − Pi − Pj )
9
Unbiasedness:
  y y j   PP PP 
 (1 − Pj ) i + (1 − Pi )   + i j 
i j
1   Pi Pj  1 − Pi 1 − Pj 
E [ z (u ) ] = ∑ 
N i< j 2 − Pi − Pj
  y y j   PP P P 
 (1 − Pj ) i + (1 − Pi )   + j i 
i j
1   Pi Pj  1 − Pi 1 − Pj 
= 2∑ 
2 N i< j 2 − Pi − Pj
  y y j   PP P P 
 (1 − Pj ) i + (1 − Pi )   + j i 
i j
1   Pi Pj  1 − Pi 1 − Pj 
= ∑
2 N i≠ j 2 − Pi − Pj
1   yi y j   PP 
=
2N
∑ (1 − P ) P + (1 − P ) P   (1 − P )(1 − P ) 
j i
i j
 
i≠ j i j   i j 
1  yi Pj yP 
= ∑  + j i 
2 N i ≠ j 1 − Pi 1 − Pj 
N N  N 
Using result ∑ a b = ∑ a ∑ b
i j i j − bi , we have
i ≠ j =1 i =1  j =1 
1  y
N N   N y j N 
E [ z (u ) ] =  ∑ i (∑ Pj − Pi )  + ∑ (∑ Pi −Pj ) 
2 N   i =1 1 − Pi j =1   j =1 1 − Pj i =1 
  N yi
1  N y 
= ∑ (1 − Pi )  + ∑ j (1 − Pj ) 
  i =1 1 − Pi
2N  j =1 1 − Pj 
1 N N
 Y +Y
= ∑ yi + ∑ yi  =
2 N  i =1 i =1  2
10
=Y.
Variance: The variance of z (u ) can be found as
2
1 N (1 − Pi − Pj )(1 − Pi )(1 − Pj )  Yi Y j  PP
i j (2 − Pi − Pj )
Var [ z (u ) ] = ∑  − 
2 i ≠ j =1 N 2 (2 − Pi − Pj ) P
 i P j  (1 − Pi )(1 − Pj )
2
1 N PP (1 − Pi − Pj )  Yi Y j 
= ∑ i 2j  − .
2 i ≠ j =1 N (2 − Pi − Pj )  Pi Pj 
Using the theorem that Var (θû ) ≤ Var (θˆ0 ) we get
Var [ z (u ) ] ≤ Var  z ( s1* ) 
and Var [ z (u ) ] ≤ Var  z ( s2* )  .
Unbiased estimator of Var [ z (u ) ]
An unbiased estimator of Var [ z (u ) ] is

2
 [ z (u ) ] = (1 − Pi − Pj )(1 − Pi )(1 − Pj )  yi − y j  .
Var
N 2 (2 − Pi − Pj ) 2 P P 
 i j 
11
Sampling Theory
MODULE VII
LECTURE - 27
VARYING PROBABILITY
SAMPLING
DR. SHALABH
1
Horvitz Thompson (HT) estimator of population mean
The unordered estimates have limited applicability as they lack simplicity and expressions for estimators and their
variance becomes unmanageable when sample size is even moderately large. The HT estimate is simple than
other estimators. Let N be the population size and yi , (i = 1, 2,..., N ) be the value of characteristic under study
and a sample of size n is drawn by WOR using arbitrary probability of selection at each draw.
Thus prior to each succeeding draw, there is defined a new probability distribution for the units available at that
draw. The probability distribution at each draw may or may not depend upon the initial probability at the first draw.
Define a random variable α i (i = 1, 2,.., N ) as

1 if Yi is included in a sample of size n
αi = 

0 otherwise.
nyi
Let zi = , i = 1...N assuming E (α i ) > 0 for all i.
NE (α i )
The HT estimator of Y is
1 n
zn = YˆHT = ∑ zi
n i =1
1 N
= ∑ α i zi .
n i =1 2
Unbiasedness
1 N
E (YˆHT ) = ∑ E ( ziαi )
n i =1
1 N
= ∑ zi E (αi )
n i =1
1 N nyi
= ∑
n i =1 NE (αi )
E (αi )
1 N nyi
= ∑ =Y
n i =1 N
which shows that HT estimator is an unbiased estimator of population mean.
Variance
V (YˆHT ) = V ( zn )
= E ( zn2) − [ E ( zn ) ]
2
= E ( zn2) − Y 2 .
Consider
2
1 N 
E ( zn2 ) = 2
E  ∑ αi zi 
n  i =1 
1 N 2 2 N N

= 2
E  ∑ αi z1 + ∑ ∑ αiα j zi z j 
n  i =1 i ( ≠ j ) =1 j =1 
1 N 2 N N

= 2
n 
 ∑ zi E (α i
2
) + ∑ ∑ zi z j E (αiα j )  .
3
i ( ≠ j ) =1 j =1 
If S = {s} is the set of all possible samples and π i is probability of selection of ith unit in the sample s then
E (α i ) = 1 P( yi ∈ s ) + 0.P( yi ∉ s )
= 1.π i + 0.(1 − π i ) = π i
E (α i2 ) = 12. P( yi ∈ s ) + 02.P( yi ∉ s )
= πi.
So
E (αi ) = E (αi2 )
 
1 N 2 N N
E ( zn2 ) = ∑ i i i (∑
z π + ∑ π ij zi z j 
n 2  i =1 ≠ j ) =1 j =1

 
where π ij is the probability of inclusion of ith and jth unit in the sample. This is called as second order inclusion
probability.
Now
Y 2 = [ E ( zn ) ]
2
2
1   N 
=  E  ∑ α i zi  
n2   i =1 
1 N 2 2
N N
2 ∑ i [
= z E (α i ) ]  + ∑ ∑ zij E (α i ) E (α j )
n  i =1  i ( ≠ j ) =1 j =1
1 N 2 2 N N 
= 2  ∑
n  i =1
zi π i + ∑ ∑ π iπ j zi z j  .
i ( ≠ j ) =1 j =1 
4
Thus
1 N N N  1 N N N 
Var (YˆHT ) = 2  ∑ π i zi2 + ∑ ∑ π ij zi z j  − 2  ∑ π i2 zi2 + ∑ ∑ π iπ j zi z j 
n  i =1 i ( ≠ j ) =1 j =1  n  i =1 i ( ≠ j ) =1 j =1 
1 N N N 
2 ∑ i ∑ ∑
= π (1 − π i ) zi
2
+ (π ij − π iπ i ) zi z j 
n  i =1 i ( ≠ j ) =1 j =1 
1 N n 2 yi2 N N n 2 yi y j 
= 2  ∑ π i (1 − π i ) 2 2 + ∑ ∑ (π ij − π iπ i ) 2 
n  i =1 N π i i ( ≠ j ) =1 j =1 N π iπ j 
1  N  1− π  2 N N  π −π π  
∑   yi + ∑ ∑ 
ij i i
= i
 yi y j  .
N2  i =1  π i  i ( ≠ j ) =1 j =1  π iπ j  
 n 2 N  π −π π  yi y j 
 (Yˆ ) = 1  yi (1 − π i ) +
N
2 ∑ ∑ ∑
Vˆ1 = Var
ij i j
  .
HT
N  πi2
i ( ≠ j ) =1 j =1  π ij  π iπ j 
This is an unbiased estimator of variance .

yi
Drawback: It does not reduces to zero when all π are same, i.e. when yi ∝ π i .
i
Consequently, this may assume negative values for some samples.

A more elegant expression for the variance of yˆ HT has been obtained by Yates and Grundy.
5
Yates and Grundy form of variance
Since there are exactly n values of α i which are 1 and ( N − n) values which are zero, so
N
∑α
i =1
i = n.
Taking expectation on both sides

N
∑ E (α ) = n.
i =1
i
Also
2
 N  N N N
E  ∑ α i  = ∑ E (α i2 ) + ∑ ∑ E (α iα j )
 i =1  i =1 i ( ≠ j ) =1 j =1
N N N
E ( n ) = ∑ E (α i ) + ∑ ∑ E (α α
2
i J ) (using E (α i ) = E (α i2 ))
i =1 i ( ≠ j ) =1 j =1
N N
n2 = n + ∑ ∑ E (α α
i ( ≠ j ) =1 j =1
i J )
N N
∑ ∑ E (α α
i ( ≠ j ) =1 j =1
i J ) = n(n − 1).
Thus
E (α iα j ) = P(α i = 1, α j = 1)
= P(α i = 1) P(α j = 1| α i = 1)
= E (α i ) E (α j | α i = 1).
6
Therefore
N
∑
j ( ≠ i ) =1
 E (α i α j ) − E (α i ) E (α j ) 
N
= ∑
j ( ≠ i ) =1
 E (α i ) E (α j | α i = 1) − E (α i ) E (α j ) 
N
= E (α i ) ∑
j ( ≠ i ) =1
 E (α j | α i = 1) − E (α j ) 
= E (α i ) [ (n − 1) − (n − E (α i )) ]
= − E (α i ) [1 − E (α i ) ]
= −π i (1 − π i ). (1)
Similarly
N
∑
i ( ≠ j ) =1
 E (αi α j ) − E (αi ) E (α j )  = −π j (1 − π j ). (2)
We had earlier derived the variance of HT estimator as
1 N N N 
Var (YˆHT ) = 2  ∑ π i (1 − π i ) zi2 + ∑ ∑ (π ij − π iπ j ) zi z j  .
n  i =1 i ( ≠ j ) =1 j =1  7
Using (1) and (2) in this expression, we get
1 N N N N

Var (YˆHT ) = 2  ∑ π i (1 − π i ) zi2 + ∑ π j (1 − π j ) z 2j − 2 ∑ ∑ (π iπ j − π ij ) z i z j 
2n  i =1 j =1 i ≠ j =1 j =1 
1  N  N  2
=  − ∑  ∑ E (αiα j ) − E (αi ) E (α j )  zi
2n 2  i =1  j ( ≠i )=1 
N
 N  N n 
− ∑  ∑ E (αiα j ) − E (αi ) E (α j )  z 2j − 2 ∑ ∑{E (αi ) E (α j ) − E (αiα j ) zi z j }
j =1  i ( ≠ j ) =1  i ( ≠ j ) =1 j =1 
1  N N N N N N 
=   ∑ ∑ ( −π ij + π iπ i ) zi + ∑ ∑ ( −π ij + π iπ i ) z j + 2 ∑ ∑ (π ij − π iπ i ) zi z j 
2 2
2n 2   i ( ≠ j )=1 j =1 i ( ≠ j ) =1 j =1 i ( ≠ j ) =1 j =1 
1  N N 
2  ∑ ∑
= (π iπ j − π ij )( zi2 + z 2j − 2 zi z j )  .
2n  i ( ≠ j )=1 j =1 
The expression for π i and π ij can be written for any given sample size.
For example, for n = 2, assume that at the second draw, the probability of selecting a unit from the units available
is proportional to the probability of selecting it at the first draw. Since
E (α i ) = Probability of selecting Y in a sample of two
i
= Pi1 + Pi 2
where Pir is the probability of selecting Yi at rth draw (r = 1, 2). If Pi is the probability of selecting the rth unit at first
draw then we had earlier derived that
8
Pi1 = Pi
 yi is not selected  yi is selected at 2nd draw/ 

Pi 2 = P  P 
at 1 draw   yi is not selected at 1 draw 
st st
N Pj Pi
= ∑
j ( ≠ i ) =1 1 − Pj
N P P 
=  ∑ j − i  Pi .
 j =1 1 − Pj 1 − Pi 
So
N P P 
E (α i ) = Pi  ∑ j − i  .
 j =1 1 − Pj 1 − Pi 
Again
E (α iα j ) = Probability of including both yi and y j in a sample of size two
= Pi1 Pj 2 / i + Pj1 Pi 2 / j
Pj Pi
= Pi + Pj
1 − Pi 1 − Pj
 1 1 
=PP
i j  + .
1 − Pi 1 − Pj 
The estimate of variance is given by
 (Yˆ ) = 1
n n π iπ j − π ij
Var HT
2n 2
∑∑ π
( zi −z j ) 2 .
9
i(≠ j ) j =1 ij
Midzuno system of sampling:
Under this system of selection of probabilities, the unit in the first draw is selected with unequal probabilities of
selection (i.e. pps) and at all subsequent draws, remaining all the units are selected with SRSWOR.
Under this system
E (α i ) = π i = P (unit i is included in the sample)
= P (unit i is included in 1st draw) + P(unit i is included in any other draw )
 Probability that yi is not selectedat the first draw and 

= Pi +  
 is selected at any of subsequent (n -1) draws 
 
n −1
= Pi + (1 − Pi )
N −1
N −n n −1
= Pi + .
N −1 N −1
10
Similarly,
E (α iα j ) = Probability both yi and y j are in sample
 Probability that yi is selected at the first draw and 

=  
 y is selected at any of the subsequent draws (n − 1) draws 
 j 
 Probability that y j is selected at the first draw and 
+  
 y is selected at any of the subsequent (n − 1) draws 
 i 
 Probability thatneither y j nor y j is selected at the first draw but 
+  
 both of themare selected duringthe subsequent (n − 1) draws 
 
n −1 n −1 (n − 1)(n − 2)
= Pi + Pj + (1 − Pi − Pj )
N −1 N −1 ( N − 1)( N − 2)
(n − 1)  N − n n−2 
= ( Pi + Pj ) +
( N − 1)  N − 2 N − 2 
n −1  N − n n−2 
π ij = ( Pi + Pj ) + .
N − 1  N − 2 N − 2 
Similarly,
E (αiα jα k ) = π ijk = Probability of including yi , y j and yk in the sample
( n − 1)( n − 2)  N − n n−3
=  ( Pi + Pj + Pk ) + .
( N − 1)( N − 2)  N − 3 N − 3 
By an extension of this argument, if yi , y j ,..., yr are the r units in the sample of size n(r < n), the probability of
including these r units in the sample is
(n − 1)(n − 2)...(n − r + 1)  N − n n−r 

E (α iα j ...α r ) = π ij ...r =  ( Pi + Pj + ... + Pr ) + .
( N − 1)( N − 2)...( N − r + 1)  N − r N − r 
11
Similarly, if yi , y j ,..., yq be the n units, the probability of including these n units in the sample is
( n − 1)( n − 2)...1
E (αiα j ...α q ) = π ij ... q = ( Pi + Pj + ... + Pq )
( N − 1)( N − 2)...( N − n + 1)
1
= ( Pi + Pj + ... + Pq )
 − 1
N
 n −1 
 
which is obtained by substituting r = n.
Thus if Pi ' s are proportional to some measure of size of units in population then the probability of selecting a
specified sample is proportional to the total measure of the size of units included in the sample.
Substituting these π i , π ij , π ijk etc. in the HT estimator, we can obtain the estimator of population’s mean and
variance. In particular, an unbiased estimate of variance of HT estimator given by
 (Yˆ ) = 1
N N π iπ j − π ij
Var HT
2n 2
∑∑
i ≠ j =1 j =1 π ij
( zi − z j ) 2
reduces to
 (Yˆ ) = N −n n n  n −1 ( zi − z j ) 2 
Var HT
2( N − 1) 2 n 2
∑ ∑ ( N − n ) PP
i j +
N −2
(1 − Pi − Pj )
π ij 
.

i ( ≠ j ) =1 j =1 
The main advantage of this method of sampling is that it is possible to compute a set of revised probabilities of
selection such that the inclusion probabilities resulting from the revised probabilities are proportional to the initial
probabilities of selection. It is desirable to do so since the initial probabilities can be chosen proportional to some
measure of size.
12
Sampling Theory
MODULE VIII
LECTURE - 28
DOUBLE SAMPLING
(TWO PHASE SAMPLING)
DR. SHALABH
1
The ratio and regression methods of estimation require the knowledge of population mean of auxiliary variable ( X
2 ) to
estimate the population mean of study variable (Y ). If information on the auxiliary variable is not available, then there
are two options. One option is to collect a sample only on study variable and use sample mean as an estimator of
population mean.
Another solution is to use a part of the budget for collecting information on auxiliary variable to collect a large
preliminary sample in which xi alone is measured. The purpose of this sampling is to furnish a good estimate of X .
This method is appropriate when the information about xi is on file cards that have not been tabulated. After collecting
a large preliminary sample of size n ' units from the population, select a smaller sample of size n from it and collect
the information on y. These two estimates are then used to obtain an estimator of population mean Y This procedure
of selecting a large sample for collecting information on auxiliary variable x and then selecting a sub-sample from it for
collecting the information on the study variable y is called double sampling or two phase sampling. It is useful when it
is considerably cheaper and quicker to collect data on x than y and there is high correlation between x and y.
In this sampling, the randomization is done twice. First a random sample of size n ' is drawn from a population of size
N and then again a random sample of size n is drawn from the first sample of size n '.
So the sample mean in this sampling is a function of the two phases of sampling. If SRSWOR is utilized to draw the
samples at both the phases, then
- number of possible samples at the first phase when a sample of size n is drawn from a population of size N is
N
   M 0 , say.
 n'
- number of possible samples at the second phase where a sample of size n is drawn from the first phase sample of
 n '
n ' size is    M 1 , say.
n
2
Population of X
(N units)
Sample
(Large) M 0 samples
n‘ units
Subsample
(small) M 1 samples
n units
3
Then the sample mean is a function of two variables. If  is the statistic calculated at the second phase such that
 ij , i  1, 2,..., M 0 , j  1, 2,..., M 1 with Pij being the probability that ith sample is chosen at first phase and jth sample is
chosen at second phase, then
E ( )  E1  E2 ( ) .
where E2 ( ) denotes the expectation over second phase and E1 denotes the expectation over the first phase. Thus
M 0 M1
E ( )   Pij ij
i 1 j 1
M 0 M1
  PP
i j |i  ij (using P( A  B)  P ( A) P( B | A))
i 1 j 1
M0 M1
 P P i j |i  ij .
i 1 j 1
 
st
1 stage 2nd stage
4
Variance of 
Var ( )  E   E ( )
2
 E  (  E2 ( ))  ( E2 ( )  E ( )) 
2
 E   E2 ( )    E2 ( )  E ( )  0
2 2
2
 E1 E2   E2 ( )]2  [ E2 ( )  E ( ) 
 E1 E2   E2 ( )   E1 E2  E2 ( )  E ( ) 
2 2

constant for E2
 E1 V2 ( )   E1  E2 ( )  E1 ( E2 ( )) 
2
 E1 V2 ( )   V1  E2 ( )
Note The two phase sampling can be extended to more than two phases depending upon the need and objective of
the experiment. Various expectations can also be extended on the similar lines.
5
Double sampling in ratio method of estimation
If the population meanX is not known then double sampling technique is applied. Take a large initial sample of n '
size by SRSWOR to estimate the population mean X as
1 n'
Xˆ  x '   xi .
n ' i 1
Then a second sample is a subsample of size n selected from the initial sample by SRSWOR. Let y and x be the
means of y and x based on the subsample. Then E ( x ')  X , E ( x )  X , E ( y )  Y .
The ratio estimator under double sampling now becomes
y
YˆRd  x '
x
The exact expressions for the bias and mean squared error of YˆRd are difficult to derive. So we find their approximate
expressions using the same approach mentioned while describing the ratio method of estimation.
Let
y Y
0  ,
Y
xX
1  ,
X
x ' X
2  .
X
6
E ( 0 )  E (1 )  E ( 2 )  0
1 1 
E (12 )     Cx2
n N 
1
E (1 2 )  E ( x  X )( x ' X )
X2
1
 E1  E2 ( x  X )( x ' X ) | n '
X2 
1
 E1 ( x ' X ) 2 
X2 
2
1 1S
    x2
 n' N  X
1 1
    Cx2
 n' N 
 E ( 22 ).
E ( 0 2 )  Cov( y , x ')
 Cov  E ( y | n '), E ( x ' | n ')  E Cov( y , x ') | n '
 Cov Y , X   E Cov( y ', x ')
 Cov  ( y ', x ')
 1 1  S xy
  
 n ' N  XY
 1 1  S Sy
   x
 n' N  X Y
1 1
     Cx C y
 n' N 
7
where y ' is the sample mean of y ' s based on the sample size n '.
1
E ( 01 )  Cov( y , x )
xy
 1 1  S xy
  
n N  XY
 1 1  S Sy
   x
n N  X Y
1 1 
     CxC y
n N 
1
E ( 22 )  Var ( y )
Y2
1
 V1  E2 ( y | n ')  E1 V2 ( yn | n ')
Y2 
1   1 1  '2  
 V1 ( yn )  E1    s y  
'
Y2   n n '   
1  1 1  2  1 1  2 
    Sy     Sy 
Y 2  n ' N   n n' 
.
2
 1 1  Sy
   2
 n N Y
1 1 
    C y2
n N 
where s '2y is the mean sum of squares of y based on initial sample of size n '.
8
1
E (1 2 )  Cov( x , x ')
X2
1
 Cov  E ( x | n '), E ( x ' | n ')  0 
X2 
1
 Var ( X ')
X2
where Var ( X ') is the variance of mean of x based on initial sample of size n '.
Estimation error of YˆRd

Write YˆRd as
(1   0 ) Y
YˆRd  (1   2 ) X
(1  1 ) X
 Y (1   0 )(1   2 )(1  1 ) 1
 Y (1   0 )(1   2 )(1  1  12  ...)
 Y (1   0   2   0 2  1   o1  1 2  12 )
upto the terms of order two. Other terms of degree greater than two are assumed to be negligible.
9
Bias of YˆRd
E (YˆRd )  Y 1  0  0  E ( 0 2 )  0  E ( 01 )  E (1 2 )  E (12 ) 
Bias (YˆRd )  E (YˆRd )  Y
 Y  E ( 0 2 )  E ( 01 )  E (1 2 )  E (12 ) 
 1 1  1 1  1 1 1 1  
 Y     Cx C y      CxC y     Cx2     Cx2 
 n ' N  n N   n' N  n N  
1 1 
 Y     C x2   Cx C y 
 n n'
1 1 
 Y    Cx (Cx   C y ).
 n n'
The bias is negligible if n is large and relative bias vanishes if Cx2  Cxy , i.e., the regression line passes through origin.
Mean squared error of YˆRd
MSE (YˆRd )  E (YˆRd  Y ) 2`

 Y 2 E ( 0   2  1 ) 2 (retaining the terms upto order two)
 Y 2 E  02  12   22  2 0 2  2 01  21 2 
 Y 2 E  02  12   22  2 0 2  2 01  2 22 
 1 1  1 1  1 1 1 1 1 1  
 Y 2    C y2     Cx2     Cx2  2     Cx C y  2     C xC y 
 n N  n N   n' N   n' N  n N  
1 1  1 1
 Y 2     Cx2  C y2  2  Cx C y   Y 2    Cx (2  C y  Cx )
n N   n' N 
10
 1 1
 MSE (ratio estimator)  Y 2     2  Cx C y  Cx2  .
 n' n 
The second term is the contribution of second phase of sampling. This method is preferred over ratio method if
2  Cx C y  Cx2  0
1 Cx
or   .
2 Cy
Choice of n and n’
Write
V V'
MSE (YˆRd )  
n n'
where V and V ' contain all the terms containing n and n ' respectively.
The cost function is C0  nC  n ' C ' where C and C ' are the costs per unit for selecting the samples n and n '
respectively.
Now we find the optimum sample sizes n and n ' for fixed cost C0 . The Lagrangian function is
V V'
    ( nC  n ' C ' C0 )
n n'
 V
 0  C  2
n n
 V'
 0  C '  2 .
n ' n'
Thus Cn 2  V
V
or n
C
or  nC  VC .
11
Similarly  n ' C '  V ' C '.

Thus
VC  V ' C '

C0
and so
C0 V
Optimum n   nopt , say
VC  V ' C ' C
C0 V'
Optimum n '   nopt
'
, say
VC  V ' C ' C '
V V'
Varopt (YˆRd )   '
nopt nopt
( VC  V ' C ') 2
 .
C0
Comparison with sample mean under SRS

C0
If X is ignored and all resources are used to estimate Y by y , then required sample size = .
C
S y2 CS y2
Var ( y )  
C0 / C C0
Var ( y ) CS y2
Relative efficiency   .
ˆ
Varopt (YRd ) ( VC  V ' C ')
2
12
Sampling Theory
MODULE VIII
LECTURE - 29
DOUBLE SAMPLING
(TWO PHASE SAMPLING)
DR. SHALABH
1
Double sampling in regression method of estimation
When the population mean of auxiliary variable X is not known, then double sampling is used as follows:
 A large sample of size n ' is taken from the population by SRSWOR from which the population mean X is
estimated as x ' , i.e. Xˆ = x '.
 Then a subsample of size n is chosen from the larger sample and both the variables x and y are measured from
it by taking x ' in place of X and treat it as if it is known..
Then E ( x ') = X , E ( x ) = X , E ( y ) = Y . The regression estimate of Y in this case is given by
Yˆregd = y + βˆ ( x '− x )
where n
sxy ∑ ( x − x )( y − y )
i i
βˆ = 2
= i =1
n
s x
∑ (x − x )
i =1
i
2
S xy
is an estimator of β = based on the sample of size n .
S x2
It is difficult to find the exact properties like bias and mean squared error of Yˆregd , so we derive the approximate
expressions.
.
2
Let
x−X
ε1 = ⇒ x = (1 + ε1 ) X
X
x '− X
ε2 = ⇒ x ' = (1 + ε 2 ) X
X
sxy − S xy
ε3 = ⇒ sxy = (1 + ε 3 ) S xy
S xy
sx2 − S x2
ε4 = ⇒ sx2 = (1 + ε 4 ) S x2
S x2
E (ε1 ) = 0, E (ε 2 ) = 0, E (ε 3 ) = 0, E (ε 4 ) = 0.
3
Define µ21 = E ( x − X ) 2 ( y − Y )  , µ30 = E  x − X  .
Estimation error:
Then Yˆregd = y + βˆ ( x '− x )
S xy (1 + ε 3 )
= y+ (ε 2 − ε1 ) X
S x2 (1 + ε 4 )
S xy
= y+X (1 + ε 3 )(ε 2 − ε1 )(1 + ε 4 ) −1
S x2
= y + X β (1 + ε 3 )(ε 2 − ε1 )(1 − ε 4 + ε 42 − ...).
Retaining the powers of ε ' s upto order two assuming ε 3 < 1,

 y + X β (ε 2 + ε 2ε 3 − ε 2ε 4 − ε1 − ε1ε 3 + ε1ε 4 ).
3
Bias:
E (Yˆregd ) = Y + X β [ E (ε 2ε 3 ) − E (ε 2ε 4 ) − E (ε1ε 3 ) + E (ε1ε 4 ) ]
Bias (Yˆregd ) = E (Yˆregd ) − Y
 1 1  1  ( x '− X )( sxy − S xy )  
= X β  −  ∑   
 n ' N  N  XS xy  
1 11  ( x '− X )( sx2 − S X2 ) 

− − 
 n' N  N
∑ XS X2

 
1 1  1  ( x − X )( sxy − S xy ) 
− − 
n N N
∑  XS xy

 
1 1  1  ( x − X )( sx2 − S x2 ) 
+ − 
n N N
∑ XS x2

 
 1 1  µ 1 1 µ 1 1  µ 1 1  µ 
= X β  −  21 −  −  302 −  −  21 +  −  302 
 n ' N  XS xy  n ' N  XS x  n N  XS xy  n N  XS x 
 1 1  µ µ 
= − β  −   21 − 302  .
 n n '   S xy S x 

4
Mean squared error:
MSE (Yˆregd ) = E (Yregd − Y ) 2

2
=  y + βˆ ( x '− x ) − Y 
2
= E ( y − Y ) + X β (1 + ε 3 )(ε 2 − ε1 )(1 − ε 4 + ε 42 − ...)  .
Retaining the powers of ε ' s upto order two
2
 E ( y − Y ) + X β (ε 2 + ε 2ε 3 − ε 2ε 4 − ε1 − ε1ε 3 + ε1ε 4 ) 
 E ( y − Y ) 2 + X 2 β 2 E (ε12 + ε 22 − 2ε1ε 2 ) + 2 X β E[( y − Y )(ε1 − ε 2 ) 
 1 1  S 2  1 1  S 2 1 1  S 
2
= Var ( y ) + X 2 β 2  −  x2 +  −  x2 − 2  −  x2 
 n N  X  n' N  X n N  X 
 1 1  S 1 1 S 
+ 2 β X  −  xy −  −  xy 
 n ' N  X  n N  X 
1 1 1 1
= Var ( y ) + β 2  −  S x2 − 2 β  −  S xy
 n n  n n
1 1
= Var ( y ) + β 2  −  ( β 2 S x2 − 2 β S xy )
n n
 1 1  S 
2
S
= Var ( y ) +  −   xy4 S x2 − 2 xy2 S xy 

 n n   Sx Sx 

2
1 1   1 1   S xy 
=  −  S y2 −  −   
 n N   n n '   Sx 
1 1  1 1 
=  −  S y2 −  −  ρ 2 S y2 (using S xy = ρ S x S y )
n N   n n'
(1 − ρ ) S y ρ 2 S y2
2 2
≈ + . (Ignoring finite population correction)

5
n n'
Clearly, Yˆregd is more efficient than sample mean SRS, i.e. when no auxiliary variable is used.
Now we address the issue that whether the reduction in variability is worth the extra expenditure required to
observe the auxiliary variable.
Let the total cost of survey is
C0 = C1n + C2 n '
where C1 and C2 are the costs per unit observing the study variable y and auxiliary variable x respectively.
ˆ
Now minimize the MSE (Yregd ) for fixed cost C0 using Lagrangian function with Lagranagian multiplier λ as
S y2 (1 − ρ 2 ) ρ 2 S y2
ϕ= + + λ (C1n + C2 n '− C0 )
n n'
∂ϕ 1
= 0 ⇒ − 2 S y2 (1 − ρ 2 ) + λC1 = 0
∂n n
∂ϕ 1
= 0 ⇒ − 2 S y2 ρ 2 + λC2 = 0.
∂n ' n'
Thus
S y2 (1 − ρ 2 )
n=
λC1
and
ρSy
n' = .
λ C2
6
Substituting these values in the cost function, we have
C0 = C1n + C2 n '
S y2 (1 − ρ 2 ) ρ 2 S y2
= C1 + C2
C1λ λ C2
or C0 λ = C1S y2 (1 − ρ 2 ) + C2 ρ 2 S y2
1  2
or λ = S C1 (1 − ρ 2 ) + ρ S y C2  .
2  y
C0 
Thus the optimum values of n and n ' are
ρ S y C0
'
nopt =
C2  S y C1 (1 − ρ 2 ) + ρ S y C2 
 
C0 S y 1 − ρ 2
nopt = .
C1  S y C1 (1 − ρ 2 ) + ρ S y C2 
 
The optimum mean squared error of Yˆregd is obtained by substituting n = nopt and n ' = nopt
'
as
MSE (Yˆregd )opt =

S y2 (1 − ρ 2 )  C1
 ( )
C1S y2 (1 − ρ 2 ) + ρ S y C2 
 +
S y2 ρ 2 C2  S y
 ( )
C1 (1 − ρ 2 ) + ρ S y C2 

C0 S (1 − ρ )
2
y
2 ρ S y C0
1  2
= S y C1 (1 − ρ 2 ) + ρ S y C2 
C0  
S y2  2
= C1 (1 − ρ 2 ) + ρ C2  .
C0  
7
The optimum variance for SRS where no auxiliary information is used is
C1S y2
Var (YSRS )opt =
C0
which is obtained by substituting ρ = 0, C2 = 0 in MSE (Yˆregd.)opt .
The relative efficiency is
Var (YSRS )opt C1S y2

RE = =
MSE (Yˆ )
2
regd opt S y2  C1 (1 − ρ 2 ) + ρ C2 
 
1
= 2
 C2 
 1− ρ + ρ
2

 C1 
≤ 1.
Thus the double sampling in regression estimator will lead to gain in precision if
C1 ρ2
> .
C2 1 − 1 − ρ 2  2
 
8
Double sampling for probability proportional to size estimation:
Suppose it is desired to select the sample with probability proportional to auxiliary variable x but information on x is not
available. Then, in this situation, the double sampling can be used. An initial sample of size n ' is selected with
SRSWOR from a population of size N, and information on x is collected for this sample. Then a second
sample of size n is selected with replacement and with probability proportional to x from the initial sample of
size n ' . Let x ' denote the mean of x for the initial sample of size n ' . Let x and y denote means respectively
of x and y for the second sample of size n. Then we have the following theorem.
Theorem:
(1) An unbiased estimator of population mean Y is given as
ˆ
'
xtot n
 yi 
Y = ∑  ,
n ' n i =1  xi 
'
where xtot denotes the total for x in the first sample.
2
 
ˆ 1 1 2 (n '− 1) N
xi  yi 
(2) Var (Y ) =  −  S y +
 n' N 
∑ 
N ( N − 1)nn ' i =1 X tot  xi
− Ytot  , where X tot

and Ytot denote the totals of x and y
X 
respectively in the population.  tot 
(3) An unbiased estimator of the variance ofYˆ is given by
 (Yˆ ) =  1 − 1  1  ' n yi2 xtot

'2
( A − B)  1 n
 xtot
'
yi ˆ 
Var   +  xtot ∑
 n ' N  n(n '− 1)  i =1 xi
− + ∑ 
n '(n − 1)  n(n − 1) i =1  n ' xi
−Y 

2
 n y  n
y2
where A =  ∑ i  and B = ∑ i2 .
 i =1 xi  i =1 xi
9
Proof: Before deriving the results, we first mention the following result proved in varying probability scheme sampling.
Result: In sampling with varying probability scheme for drawing a sample of size n from a population of size N and
with replacement.
1 n y
(i) z = ∑ zi is an unbiased estimator of population mean y where zi = NPi , Pi being the probability of selection of
n i =1 i
ith unit.
2
1  N yi2 2 2 1 N
y 
(ii) Var ( z ) = ∑ − N Y  = ∑ Pi  i − Y  .
nN 2  i =1 Pi  nN 2
i =1 P
 i 
(iii) An unbiased estimator of variance of z is
2
1 N
 yi 
(z ) =
Var ∑ 
n(n − 1) i =1  NPi
−z .

Let E2 denote the expectation of Yˆ , when the first sample is fixed. The second is selected with probability
xi
proportional to x, hence using the result (i) with Pi = '
, we find that
xtot
 
 Yˆ  1 n y 
E2   = E2  ∑ i 
 n'  n i =1 n ' xi 
 
 '
xtot 
 x ' n  y 
= E2  tot ∑  i  
 nn ' i =1  xi  
= y'
where y ' is the mean of y for the first sample. Hence
10
(
E (Yˆ ) = E1  E2 Yˆ / n ' 
  )
= E1 ( yn ' )
= Yˆ ,
which proves part (1) of the theorem.
Further,
( ) (
Var (Yˆ ) = V1E2 Yˆ | n ' + E1V2 Yˆ | n ' )
(
= V1 ( y ') + E1V2 Yˆ | n ' )
+ E V (Yˆ | n ' ) .
1 1
=  −  S y2 1 2
 n' N 
Now, using result (ii), we get
2
 
 y 
( )
n'
1 x
V2 Yˆ | n ' = ' 2 ∑ ' i  i − ytot
'

nn i =1 xtot  xi 
 x' 
 tot 
2
1 n' n' y y 
= ' 2 ∑∑ xi x j  i − j  ,
nn i =1 i < j x x 
 i j 
and hence
( )1 n '( n '− 1) N n '

E1V2 Yˆ | n ' = ' 2 ∑∑
nn N ( N − 1) i =1 i < j
y y 
xi x j  i − j  ,
x x 
 i j 
11
n '(n '− 1)
since the probability of a specified pair of units being selected in the sample is . So we can express
2
N ( N − 1)
 
 
( )
1 n '(n '− 1) N
xi yi
E1V2 Yˆ | n ' = '2 ∑  − Ytot
 .
nn N ( N − 1) i =1 X tot  xi 
X 
 tot 
( )
Substituting this in V2 Yˆ | n ' , we get
2
 
1 1 (n '− 1) N
x  y 
Var (Yˆ ) =  −  S y2 + ∑ i
 i
− Ytot
 .
 n' N  nn ' N ( N − 1) i =1 X tot  xi 
X 
 tot 
This proves the second part (2) of the theorem.
We now consider the estimation of Var (Yˆ ). Given the first sample, we obtain
 1 n y2  n'
E2  ∑ i  = ∑ yi2 ,
 n i =1 Pi  i =1
xi
where Pi = '
. Also, given the first sample,
xtot
 1  yi
2

ˆ   = V (Yˆ ) = E (Yˆ 2 ) − y '2 .
n
E2  ∑  − Y  2 2
 n( n − 1) i =1  n ' Pi  
Hence,
  yi
2

2
n
ˆ 1 ˆ

E2 Y −

2
∑ 
n(n − 1) i =1  n ' Pi
− Y   = y '2 .
 
12
x' n
 yi  xi
Substituting Yˆ = tot ∑ x  and Pi = ' the expression becomes
n'n i =1  i  xtot
 x '2  n y  2  n y 2  

E2  '2  ∑  −  ∑ 2   = y '
i i 2
 nn (n − 1)  i =1 xi   i =1 xi  
 
Using
 1 n y2  n'
E2  ∑ i  = ∑ yi2 ,
 n i =1 Pi  i =1
we get
1 n x' '2
xtot  n'
E2  ∑ yi2 tot − ( A − B )  = ∑ yi2 − n ' y '2
 n i =1 xi nn '( n − 1)  i =1
2
 n y  n
y2
where A =  ∑ i  , and B = ∑ i2 which further simplifies to
 i =1 xi  i =1 xi
 1  ' n yi2 x 'tot

2
( A − B ) 
 tot ∑ −  = s y ,
'2
E2  x
 n( n '− 1)  i =1 xi n '( n − 1) 
where s '2y is the mean sum of squares of y for the first sample. Thus, we obtain
 1  ' n yi2 xtot

'2
( A − B) 
 xtot ∑ −  = E1 ( s y ) = S y
'2 2
E1 E2  (1)
 n(n '− 1)  i =1 xi n '(n − 1) 
2
which gives an unbiased estimator of S y .
13
Next, since we have 2
 
 y 
( ) − N
1 ( n ' 1) x
E1V2 Yˆ | n ' = ∑ i
 i
− Ytot  ,
nn ' N ( N − 1) i =1 X tot  xi 
X 
 tot 
and from this result we obtain
 1  
2
 yi xtot
( )
n '
E2  ∑  − Y   = V2 Yˆ | n ' .
ˆ
 n ( n − 1) i =1  n ' x i  
2
Thus  
 1 n
 xtot yi ˆ 
'
2
 (n '− 1) N
xi  y 
E1 E2  ∑  −Y   = ∑  i − Ytot  (2)
 n(n − 1) i =1  n ' xi   nn ' N (n − 1) i =1 X tot  xi 
X 
 tot 
when gives an unbiased estimator of
2
 
(n '− 1) N
xi  yi 
∑ 
nn ' N ( N − 1) i =1 X tot  xi
− Ytot
 .

X 
 tot 
Using (1) and (2), an unbiased estimator of the variance of Yˆ is obtained as

2
 (Yˆ ) =  1 − 1  1  ' n yi2 xtot
'2
( A − B)  1 n
 xtot
'
yi ˆ 
Var    tot ∑
x −  + ∑  −Y 
 n ' N  n(n '− 1)  i =1 xi n '(n − 1)  n(n − 1) i =1  n ' xi 
14
Which is the part (3) of the theorem. Thus, the theorem is proved.
Sampling Theory
MODULE IX
LECTURE - 30
CLUSTER SAMPLING
DR SHALABH
DR.
1
It is one of the basic assumptions in any sampling procedure that the population can be divided into a finite
number
b off distinct
di ti t and
d identifiable
id tifi bl units,
it called
ll d sampling
li units.
it Th smallest
The ll t units
it into
i t which
hi h the
th population
l ti
can be divided are called elements of the population. The groups of such elements are called clusters.
In many practical situations and many types of populations, a list of elements is not available and so the use of
an element as a sampling unit is not feasible.
feasible The method of cluster sampling or area sampling can be used in
such situations.
In cluster sampling
divide the whole population into clusters according to some well defined rule.
Treat the clusters as sampling units.
Choose a sample of clusters according to some procedure.
Carry out a complete enumeration of the selected clusters, i.e., collect information on all the sampling
units available in selected clusters.
Area Sampling
In case, the entire area containing the populations is subdivided into smaller area segments and each element in
the population is associated with one and only one such area segment, the procedure is called as area
sampling.
sampling
2
Examples
In a city, the list of all the individual persons staying in the houses may be difficult to obtain or even may be
not available but a list of all the houses in the city may be available. So every individual person will be
treated as sampling unit and every house will be a cluster.
The list of all the agricultural farms in a village or a district may not be easily available but the list of village
or districts are generally available. In this case, every farm is sampling unit and every village or district is the
cluster.
Moreover it is easier
Moreover, easier, faster,
faster cheaper and convenient to collect information on clusters rather than on sampling
units.
I both
In b th th
the examples,
l d
draw a sample
l off clusters
l t ffrom h
houses/villages
/ ill and
d then
th collect
ll t the
th observations
b ti on allll
the sampling units available in the selected clusters.
3
Conditions under which the cluster sampling is used
Cluster sampling is preferred when
i. No reliable listing of elements is available and it is expensive to prepare it.
ii. Even if the list of elements is available, the location or identification of the units may be difficult.
iii. A necessary condition for the validity of this procedure is that every unit of the population under study must
correspond to one and only one unit of the cluster so that the total number of sampling units in the frame may cover
all the units of the population under study without any omission or duplication.
duplication When this condition is not satisfied,
satisfied bias
is introduced.
Open segment and closed segment
It is not necessary that all the elements associated with an area segment need be located physically within its boundaries.
For example, in the study of farms, the different fields of the same farm need not lie within the same area segment. Such a
segment is called an open segment.
In a closed segment, the sum of the characteristic under study, i.e., area, livestock etc. for all the elements associated
with the segment will account for all the area, livestock etc. within the segment.
4
Construction of clusters
The clusters are constructed such that the sampling
p g units are heterogeneous
g within the clusters and
homogeneous among the clusters. The reason for this will become clear later. This is opposite to the
construction of the strata in the stratified sampling.
Th
There are two
t options
ti to
t construct
t t the
th clusters
l t – equall size
i and
d unequall size.
i W
We di
discuss the
th estimation
ti ti off
population means and its variance in both the cases.
Case of equal clusters

Suppose the population is divided into N clusters and each cluster is of size n.
Select a sample of n clusters from N clusters by the method of SRS, generally WOR.
So
total population size = NM
total sample size = nM.
Let
yij : Value of the characteristic under study for the value of jth element (j = 1,2…,M) in the ith cluster (i = 1,2…,N)
M
1
yi =
M
∑y
j =1
ij mean per element of ith cluster .
5
6
First select n clusters from N clusters by SRSWOR. Based on n clusters find the mean of each cluster
cluster So we have the cluster means as y1 , y2 ,..., yn . Consider the
separately based on all the units in every cluster.
mean of all such cluster means as an estimator of population mean as
1 n
ycl = ∑ yi
n i =1
Bias
1 n
E ( ycl ) = ∑ E ( yi )
n i =1
1 n
= ∑Y
n i =1
(since SRS is used)
=Y.
Thus ycl is an unbiased estimator of Y .
Variance
The variance of ycl can be derived on the same lines as deriving the variance of sample mean in SRSWOR.
The only difference is that in SRSWOR, the sampling units are y1 , y2 ,..., yn whereas in case of ycl , the
sampling units are y1 , y 2 ,..., yn .
⎡ N −n 2 N −n 2⎤
⎢⎣Note that in case of SRSWOR, Var ( y ) = Nn S and Var ( y ) = Nn s ⎥⎦
Var ( ycl ) = E ( ycl − Y ) 2
N −n 2
= Sb
Nn
1 N
where Sb =
2
∑ ( yi − Y )2 which is the mean sum of square between the cluster means in the population.
7
N − 1 i =1
Using
g again
g the p
philosophy
p y of estimate of variance in case of SRSWOR,, we can find
N −n 2
Var ( ycl ) = sb
Nn
1 n
where sb2 = ∑ ( yi − ycl )2 is the mean sum of squares between cluster means in the sample .
n − 1 i =1
Comparison with SRS
If an equivalent sample of nM units were to be selected from the population of NM units by SRSWOR, the
variance of the mean per element would be
NM − nM S 2
Var ( ynM ) = .
NM nM
f S2
= .
n M
N M
N -n 1
where f =
N
and S 2 = ∑∑
NM − 1 i =1 j =1
( yij − Y ) 2 .
N −n 2
Also Var ( ycl ) = Sb
Nn
f 2
= Sb .
. n
8
Consider
N M
( NM − 1) S 2 = ∑∑ ( yij − Y ) 2
i =1 j =1
N M 2
= ∑∑ ⎡⎣( yij − yi ) + ( yi − Y ) ⎤⎦
i =1 j =1
N M N M
= ∑∑ ( yij − yi ) 2 + ∑∑ ( yi − Y ) 2
i =1 j =1 i =1 j =1
= N ( M − 1) S + M ( N − 1) Sb2
2
w
where
1 N 2
S w2 = ∑ Si is the mean sum of squares within clusters in the population.
N i =1
1 M
Si2 = ∑ ( yij − yi )2 is the mean sum of squares for the ith cluster.
M − 1 j =1
The efficiency of cluster sampling over SRSWOR is

Var ( ynM )
E=
Var ( ycl )
S2
=
MSb2
1 ⎡ N ( M − 1) S w2 ⎤
= ⎢ + ( N − 1) ⎥ .
( NM − 1) ⎣ M Sb2 ⎦
2 2
g and Sb is small. So cluster sampling
Thus the relative efficiencyy increases when S w is large g will be efficient if
clusters are so formed that the variation between the cluster means is as small as possible while variation
within the clusters is as large as possible.
9
.
Efficiency in terms of intra class correlation
The intra class correlation between the elements within a cluster is given by
E ( yij − Y )( yik − Y ) 1
ρ= ;− ≤ ρ ≤1
E ( yij − Y ) M −1
N M M
1
∑∑ ∑
MN ( M − 1)) i =1 j =1 k ( ≠ j )=1
( yij − Y )( yik − Y )
=
1 N M
∑∑ ( yij − Y )2
MN i =1 j =1
N M M
1
∑∑ ∑
MN ( M − 1) i =1 j =1 k ( ≠ j ) =1
( yijj − Y )( yik − Y )
=
⎛ MN − 1 ⎞ 2
⎜ ⎟S
⎝ MN ⎠
N M M
∑∑ ∑
i =1 j =1 k ( ≠ j ) =1
( yij − Y )( yik − Y )
= .
( MN − 1)( M − 1) S 2
Consider
2
N N ⎡ M ⎤
1
∑ ( yi − Y ) 2 = ∑ ⎢
i =1 ⎣ M
∑ ( yij − Y ) ⎥
i =1 j =1 ⎦
2
N ⎡ M M M ⎤
1 1
= ∑⎢ 2 ∑(y ij − Y )2 + ∑ ∑ ( yij − Y )( yik − Y ) ⎥
i =1 ⎣ M j =1 M2 j =1 k ( ≠ j ) =1 ⎦
N M M N N M
⇒ ∑∑ ∑ ( yij − Y )( yik − Y ) = M 2 ∑ ( yi − Y ) 2 − ∑∑ ( yij − Y ) 2
i =1 j =1 k ( ≠ j ) =1 i =1 i =1 j =1
10
or ρ ( MN − 1)( M − 1) S 2 = M 2 ( N − 1) Sb2 − ( NM − 1) S 2
( MN − 1)
or Sb2 = ⎡1 + ρ ( M − 1) S 2 ⎤⎦ .
M 2 ( N − 1) ⎣
The variance of ycl now becomes
N −n 2
Var ( ycl ) = Sb
N
N − n MN − 1 S 2
= [1 + ( M − 1) ρ ].
Nn N − 1 M 2
MN − 1 N −n
For large N , 1, 1 and so
MN N
1 S2
Var ( ycl ) [1 + (M − 1) ρ ].
nM
The variance of sample mean under SRSWOR for large N is

S2
Var ( ynM ) .
nM
The relative efficiency for large N is now given by

Var ( ynM )
E=
Var ( ycl )
S2
= nM
S2
[1 + (M − 1) ρ ]
nM
1 1
= ; 1≤ ρ ≤ −
11
.
1 + ( M − 1) ρ ( M − 1)
If M = 1 then E = 1, i.e., SRS and cluster sampling are equally efficient. Each cluster will consist of one unit, i.e.,
SRS.
If M >1, then cluster sampling is more efficient when

E >1
or ( M − 1) ρ < 0
or ρ < 0.
If ρ = 0, then E = 1, i.e., there is no error which means that the units in each cluster are arranged randomly. So
the sample is heterogeneous.
In practice, ρ is usually positive and ρ decreases as M increases but the rate of decrease in ρ is much lower in
comparison to the rate of increase in M. The situation that ρ > 0 is possible when the nearby units are grouped
together to form cluster and which are completely enumerated.
There are situations when ρ < 0.
Estimation of relative efficiency
The relative efficiency

y of cluster sampling
p g relative to an equivalent
q SRSWOR is obtained as
S2
E=
MSb2
An estimator of E can be obtained by substituting the estimates of S 2 and S b2 .
1 n
Since ycl = ∑ yi is the mean of n means yi from a population of N means yi , i = 1, 2,..., N which are drawn by
n i =1
12
SRSWOR, so from the theory of SRSWOR,

⎡ 1 n ⎤
E ( sb2 ) = E ⎢
⎣ n − 1
∑
i =1
( yi − yc ) 2 ⎥
⎦
1 N
= ∑ ( yi − Y )2
N − 1 i =1
= Sb2 .
2
Thus sb is an unbiased estimator of Sb2 .
1 n 2
Since sw2 = ∑ Si is the mean of n mean sum of squares Si2 drawn from the population of N mean sums of
n i =1
squares Si2 , i = 1,
1 22,..., N , so it follows from the theory of SRSWOR that
⎡1 n ⎤
E ( sw2 ) = E ⎢ ∑ Si2 ⎥
n
⎣ i =1 ⎦
N
1
=
N
∑S
i =1
i
2
= S w2 .
Thus sw2 is an unbiased estimator of S w2 .
13
Consider
N M
1
S2 = ∑∑
MN − 1 i =1 j =1
( yij − Y ) 2
N M 2
or ( MN − 1) S 2 = ∑∑ ⎡⎣( yij − yi ) + ( yi − Y ) ⎤⎦
i =1 j =1
N M
= ∑∑ ⎡⎣( yij − yi ) 2 + ( yi − Y ) 2 ⎤⎦
i =1 j =1
N
= ∑ ( M − 1) Si2 + M ( N − 1) Sb2
i =1
= N ( M − 1) S w2 + M ( N − 1) Sb2 .
An unbiased estimator of S2 can be obtained as
1
Sˆ 2 = ⎡ N ( M − 1) sw2 + M ( N − 1) sb2 ⎤⎦
MN − 1 ⎣
N −n 2
so Var ( ycl ) = sb
Nn
N − n Sˆ 2
Var ( ynM ) =
Nn M
1 n
where
h sb2 = ∑ ( yi − ycl )2 .
n − 1 i =1
14
S2
An estimate of efficiency E = is
MSb2
N ( M − 1) sw2 + M ( N − 1) sb2
Eˆ = .
M ( NM − 1) sb2
If N is large so that M ( N − 1) MN and MN − 1 MN , then
1 ⎛ M − 1 ⎞ S w2
E= +⎜ ⎟
M ⎝ M ⎠ MSb2
and its estimate is
1 ⎛ M − 1 ⎞ sw2
Eˆ = +⎜ ⎟ .
M ⎝ M ⎠ Msb2
15
Sampling Theory
MODULE IX
LECTURE - 31
CLUSTER SAMPLING
DR. SHALABH
1
Estimation of a proportion in case of equal cluster
Now, we consider the problem of estimation of the proportion of units in the population having a specified
attribute on the basis of a sample of clusters. Let this proportion be P.
Suppose that a sample of n clusters is drawn from N clusters with SRSWOR. Defining yij =1 if the jth unit in the
ith cluster belongs to the specified category (i.e. possessing the given attribute), we find that
yi = Pi ,
N
1
Y =
N
∑ P = P,
i =1
i
MPQ
Si2 = i i
,
( M − 1)
N
M ∑ PQ
i i
S w2 = i =1
,
N ( M − 1)
NMPQ
S2 = ,
NM − 1)
1 N
Sb2 = ∑
N − 1 i =1
( Pi − P) 2 ,
1 N 2 
=  ∑
N − 1  i =1
Pi − NP 2 

1  N N

=  −∑ Pi (1 − Pi ) + ∑ Pi − NP 2 
( N − 1)  i =1 i =1 
1  N

=  NPQ − ∑ i i,
PQ
2
( N − 1)  i =1 
where Pi is the proportion of elements in the ith cluster, belonging to the specified category and
Qi = 1 − Pi , i = 1, 2,..., N and Q = 1 − P. Then, using the result that ycl is an unbiased estimator of Y , we find
that
1 n
Pˆcl = ∑ Pi
n i =1
is an unbiased estimator of P and
 N


 N −n 
NPQ − ∑ PQ
i i
.
Var ( Pˆcl ) =  
i =1
 Nn  ( N − 1)
This variance of Pˆcl can be expressed as

N − n PQ
Var ( Pˆcl ) = [1 + ( M − 1) ρ ],
N − 1 nM
where the value of ρ can be obtained as
M ( N − 1) Sb2 − NS w2
ρ= .
( MN − 1)
by substituting Sb2 , S w2 and S 2 in ρ , we obtain
M 1 ∑ PQ i i
ρ = 1− i =1
.
. ( M − 1) N PQ
3
The variance of Pˆc can be estimated unbiasedly by
 ( Pˆ ) = N − n s 2
Var c b
nN
N −n 1 n
= ∑
nN (n − 1) i =1
( Pi − Pˆc ) 2
N −n  ˆ ˆ n

=  nPcl Qcl − ∑ PQ i i
Nn(n − 1)  i =1 
where Qˆ cl = I − Pˆcl . The efficiency of cluster sampling relative to SRSWOR is given by
M ( N − 1) 1
E=
( MN − 1) [1 + ( M − 1) ρ ]
( N − 1) NPQ
= .
NM − 1  N

 NPQ − ∑ PQ i i
 i =1 
1
If N is large, then E ≅ .
M
An estimator of the total number of elements belonging to a specified category is obtained by multiplying Pˆcl by NM,
i.e. by NMPˆcl . The expressions of variance and variance estimator are obtained by multiplying the corresponding
expressions for Pˆcl by N M .
2 2 4
Case of unequal clusters:
In practice, the equal size of clusters are available only when planned. For example, in a screw manufacturing
company, the packets of screws can be prepared such that every packet contains same number of screws. In real
applications, it is hard to get clusters of equal size. For example, the villages with equal areas are difficult to find, the
districts with same number of persons are difficult to find, the number of members in a household may not be same
in each household in a given area.
Let there be N clusters and Mi be the size of ith cluster, let

N
M0 = ∑ Mi
i =1
N
1
M=
N
∑M
i =1
i
Mi
1
yi =
Mi
∑y
j =1
ij : mean of i th cluster
N Mi
1
Y =
M0
∑∑ y
i =1 j =1
ij
N
Mi
=∑ yi
i =1 M0
1 N
Mi
=
N
∑M
i =1
yi .
0
Suppose that n clusters are selected with SRSWOR and all elements in these selected clusters are surveyed. Assume
that M i ’s (i = 1, 2,..., N ) are known.
5
Based on this scheme, several estimators can be obtained to estimate the population mean. We consider four type of
such estimators.
6
1. Mean of cluster means:
Consider the simple arithmetic mean of the cluster means as
1 n
yc = ∑ yi
n i =1
( )
N
1
E yc =
N
∑y
i =1
i
N
Mi
≠ Y ( where Y = ∑ yi ).
i =1 M0
The bias of y c is
( ) ( )
Bias y c = E y c − Y
1 N N
 Mi 
= ∑ N
i =1
y i − ∑   yi
i =1  M 0 
1 N M N 
=−  ∑
M 0  i =1
M i yi − 0 ∑ yi 
N i =1 
  N  N 
1 
N  ∑ M i   ∑ yi  
=− ∑ M i yi −  i=1 N  i=1  
M 0  i =1
 
 
N
1
=− ∑ (M i − M )( yi − Y )
M 0 i =1
 N −1 
= −  S my
 M0 
( )
Bias y c = 0 if M i and yi are uncorrelated.
7
The mean square error is
( ) ( ) ( )
2
MSE y c = Var y c +  Bias y c 
 
2
N − n 2  N −1  2
= Sb +   S my
Nn  M0 
where
1 N
Sb2 = ∑ ( yi − Y )2
N − 1 i =1
1 N
S my = ∑ (M i − M )( yi − Y ).
N − 1 i =1
An estimate of Var y c ( ) is
c ( )
 y = N − n s2
Var
Nn
b
( )
2
1 n
where s =
2
b ∑ yc − y c .
n − 1 i =1
8
2. Weighted mean of cluster means
Consider the arithmetic mean based on cluster total as
n
1
yc* =
nM
∑M y
i =1
i i
1 n 1
E ( yc* ) = ∑ E ( yi M i )
n i =1 M
N
n 1
=
n M0
∑M y
i =1
i i
N Mi
1
=
M0
∑∑ y
i =1 j =1
ij
=Y.
Thus yc* is an unbiased estimator of Y . The variance of yc* and its estimate are given by
1 n M 
Var ( yc* ) = Var  ∑ i yi 
 n i =1 M 
N − n *2
= Sb
Nn
 ( y * ) = N − n s*2
Var c b
Nn
where 2
1 N  Mi 
Sb*2 = ∑  yi − Y 
N − 1 i =1  M
2
1 n  Mi 
sb*2 = ∑  yi − yc* 
n − 1 i =1  M
E ( sb*2 ) = Sb*2 .
9
Note that the expressions of variance of yc* and its estimate can be derived using directly the theory of SRSWOR as
follows:
Mi 1 n
zi = yi , then yc* = ∑ zi = z .
M n i =1
Since SRSWOR is followed, so

N −n 1 n
Var ( yc* ) = Var ( z ) = ∑
Nn N − 1 i =1
( zi − Y ) 2
2
N − n 1 N  Mi 
= ∑  yi − Y 
Nn N − 1 i =1  M
N − n *2
= Sb .
Nn
Since
 1 n 
E ( sb*2 ) = E  ∑
 n − 1 i =1
( zi − z ) 2 

 1 n  Mi *
2

= E ∑  yi − yc  
 n − 1 i =1  M  
2
1 N  Mi 
= ∑ 
N − 1 i =1  M
yi − Y 

= Sb*2 .
So an unbiased estimator of variance can be easily derived.

10
Sampling Theory
MODULE IX
LECTURE - 32
CLUSTER SAMPLING
DR SHALABH
DR.
1
3. Estimator based on ratio method of estimation
Consider the weighted mean of the cluster means as
n
∑M y i i
y =
**
c
i =1
n
.
∑ Mi
i =1
It is easy to see that this estimator is a biased estimator of population mean. Before deriving its bias and mean
squared error, we note that this estimator can be derived using the philosophy of ratio method of estimation. To see
this consider the study variable Ui and auxiliary variable Vi as
this,
M i yi
Ui =
M
Mi
Vi = i = 1, 2,..., N
M
N
1 N
1 ∑M i
V =
N
∑V
i =1
i =
N
i =1
M
=1
1 n
u= ∑ ui
n i =1
1 n
v= ∑ vi .
n i =1
2
The ratio estimator based on U and V is
u
YˆR = V
v
n
∑u i
= i =1
n
∑v
i =1
i
n
M i yi
∑ M
= i =1n
Mi
∑i =1 M
n
∑M y i i
= i =1
n
.
∑M
i =1
i
Since the ratio estimator is biased, so yc** is also a biased estimator. The approximate bias and mean squared errors
of yc** can be derived directly by using the bias and MSE of ratio estimator. So using the results from the ratio method
of estimation, the bias up to second order of approximation is given as follows
N − n ⎛ Sv2 Suv ⎞
Bias ( yc** ) = ⎜ − ⎟U
Nn ⎝ V 2 UV ⎠
N − n ⎛ 2 Suv ⎞
= Sv − ⎟U
Nn ⎜⎝ U ⎠
N
1 1 N
where U = ∑U = ∑ M i yi
3
i
N i =1 NM i =1
1 N
Sv2 = ∑ (Vi − V )2
N − 1 i =1
2
1 N ⎛ Mi ⎞
= ∑ ⎜ − 1⎟⎠
N − 1 i =1 ⎝ M
1 N
Suv = ∑ (U i − U )(Vi − V )
N − 1 i =1
1 N ⎛ M i yi 1 N
⎞ ⎛ Mi ⎞
= ∑
N − 1 i =1 ⎜⎝ M
−
NM
∑ M y ⎟⎠ ⎜⎝ M
i =1
i i − 1⎟
⎠
N
U 1
Ruv =
V
=U =
NM
∑M y .
i =1
i i
The MSE of yc** up to second order of approximation can be obtained as follows:
N −n 2
MSE ( yc** ) =
Nn
( Su + R 2 Sv2 − 2 RSuv )
where
2
1 N ⎛ M i yi 1 N
⎞
Su2 = ∑ ⎜
N − 1 i =1 ⎝ M
−
NM
∑
i =1
M i yi ⎟ .
⎠ 4
Alternatively,
2
N −n 1 N
MSE ( yc** ) = ∑ (U i − RuvVi )
Nn N − 1 i =1
2
N − n 1 N ⎡ M i yi ⎛ 1 N
⎞M ⎤
= ∑⎢
Nn N − 1 i =1 ⎣ M
−⎜
⎝ NM
∑
i =1
M i yi ⎟ i ⎥
⎠M⎦
2
⎡ N
⎤
N −n 1 N
⎛ Mi ⎞
2 ⎢ ∑ M i yi ⎥
= ∑ ⎜ ⎟ ⎢ yi − i=1NM ⎥⎥ .
Nn N − 1 i =1 ⎝ M ⎠ ⎢
⎣⎢ ⎦⎥
An estimator of MSE can be obtained as

2
N − n 1 n ⎛ Mi ⎞
MSE ( yc** ) = ∑
Nn n − 1 i =1 ⎝⎜ M ⎠⎟
( yi − yc** ) 2 .
**
The estimator yc is biased but consistent.
5
4. Estimator based on unbiased ratio type estimation
1 n 1 Mi
Since yc = ∑ yi
n i =1
(where yi =
Mi
∑y
i =1
ij ) is a biased estimator of population mean and
⎛ N −1 ⎞
Bias ( yc ) = − ⎜ ⎟ S my
⎝ M0 ⎠
⎛ N −1 ⎞
= −⎜ ⎟ S my .
⎝ NM ⎠
Since SRSWOR is used, so
1 n 1 n
Smy = ∑
N − 1 i =1
( M i − m )( yi − yc ), m = ∑ M i
n i =1
is an unbiased estimator of
1 n
S my = ∑ ( M i − M )( yi − Y ),
n − 1 i =1
i.e.,
E ( smy ) = S my .
So it follow that
⎛ N −1 ⎞
E ( yc ) − Y = − ⎜ ⎟ E ( smy ).
⎝ NM ⎠
6
or
⎡ ⎛ N −1 ⎞ ⎤
E ⎢ yc + ⎜ ⎟ smy ⎥ = Y
⎣ ⎝ NM ⎠ ⎦
so
⎛ N −1 ⎞
yc** = yc + ⎜ ⎟ smy
⎝ NM ⎠
is an unbiased estimator of the population mean Y .

This estimator is based on unbiased ratio type estimator. This can be obtained by replacing the study variable
Mi Mi
(earlier yi ) by yi and auxiliary variable (earlier xi ) by . The exact variance of this estimate is complicated
M M
and does not reduces to a simple form. The approximate variance upto first order of approximation is
2
1 N
⎡⎛ M i ⎞ ⎛ 1 N
⎞ ⎤
Var ( yc** ) = ∑ ⎢ yi − Y ) ⎟ − ⎜ ∑ yi ⎟ ( M i − M ) ⎥ .
n( N − 1) i =1 ⎣⎜⎝ M ⎠ ⎝ NM i =1 ⎠ ⎦
A consistent estimate of this variance is

2
⎡ ⎛ n
⎞⎤
1 n ⎢
⎛M ⎞ ⎛ 1 n
⎞⎜ ∑ Mi ⎟⎥
Var ( yc** ) = ∑ ⎢⎜ i yi − yc ) ⎟⎠ − ⎜⎝ nM ∑ yi⎟
⎜ M i − i =1
⎟⎥ .
n(n − 1) i =1 ⎢⎝ M i =1 ⎠⎜ n ⎟⎥
⎢ ⎜ ⎟
⎣ ⎝ ⎠ ⎥⎦
**
The variance of yc** will be smaller than that of yc (based on the ratio method of estimation) provided the
1 N 1 N
regression coefficient of
M i yi
M
M
on i is nearer to
M
∑
N i =1
yi than to ∑ M i yi .
M 0 i =1
7
Comparison between SRS and cluster sampling
n
In case of unequal clusters, ∑M
i =1
i
is a random variable such that
⎛ n
⎞
E ⎜ ∑ M i ⎟ = nM .
⎝ i =1 ⎠
Now if a sample of size nM is drawn from a population of size NM , then the variance of corresponding
sample mean based on SRSWOR is
NM − nM S 2
Var ( ySRS ) =
NM nM
N −n S 2
= .
Nn M
This variance can be compared with any of the four proposed estimators.
For example, in case of
n
1
yc* =
nM
∑M y
i =1
i i
2
N − n *2 N − n 1 N ⎛ Mi ⎞
Var ( yc* ) =
Nn
Sb = ∑ ⎜ yi − Y ⎟⎠ .
Nn N − 1 i =1 ⎝ M
**
Th relative
The l ti efficiency
ffi i off yc relative
l ti tot SRS based
b d sample
l mean
Var ( ySRS )
E=
Var ( yc* )
S2
= .
MSb*2
For Var ( yc* ) < Var ( ySRS ), the variance between the clusters ( Sb*2 ) should be less. So the clusters should be
8
formed in such a way that the variation between them is as small as possible.
Sampling with replacement and unequal probabilities (PPSWR)
In many practical situations, the cluster total for the study variable is likely to be positively correlated with the
number of units in the cluster. In this situation, it is advantageous to select the clusters with probability proportional
to the number of units in the cluster instead of with equal probability, or to stratify the clusters according to their
sizes and then to draw a SRSWOR of clusters from each of the stratum. We consider here the case where clusters
are selected with probability proportional to the number of units in the cluster and with replacement.
Suppose that n clusters are selected with ppswr, the size being the number of units in the cluster. Here Pi is the
probability
p y of selection assigned
g to the ith cluster which is g
given by
y
Mi Mi
Pi = = , i = 1, 2,..., N .
M 0 NM
Consider the following estimator of the population mean:
1 n
Yˆc = ∑ yi .
n i =1
Then this estimator can be expressed as
1 N
Yˆc = ∑ α i yi
n i =1
where αi denotes the number of times the ith cluster occurs in the sample. The random variables α 1 , α 2 ,..., α N
follow a multinomial probability distribution with
E (α i ) = nPi , Var (α i ) = nPi (1 − Pi )
Cov(α i , α j ) = −nPP
i j , i ≠ j.
9
Hence,
N
1
E (Yˆc ) = ∑ E (α i ) yi
n i =1
1 N
= ∑ nPi yi
n i =1
N
Mi
=∑ yi
i =1 NM
N Mi
∑∑ y
i =1 j =1
ij
= =Y.
NM
Thus Yˆc is an unbiased estimator of Y .
We now derive the variance of Yˆc .
1 N
From
o Yˆc = ∑ α i yi ,
n i =1
1 ⎡N N ⎤
Var (Yˆc ) = 2 ⎢ ∑ Var (α i ) yi2 + ∑ Cov(α i , α j ) yi y j ⎥
n ⎣ i =1 i≠ j ⎦
1 ⎡N N ⎤
2 ⎢∑ i
= P (1 − Pi ) yi2 −∑ PP i j yi y j ⎥
n ⎣ i =1 i≠ j ⎦
1 ⎡N ⎞ ⎤
2
⎛ N
= ⎢ ∑
n 2 ⎢ i =1
P y
i i
2
− ⎜ ∑ P y
i i ⎟ ⎥
⎣ ⎝ i≠ j ⎠ ⎥⎦
1 N
= 2 ∑ Pi ( yi − Y )
2
n i =1
10
N
1
=
nNM
∑ M (y −Y ) .
i =1
i i
2
An unbiased estimator of the variance of Yˆc is
n
1
Var (Yˆc ) = ∑ ( yi − Yˆc )2
n( n − 1) i =1
which can be seen to satisfy the unbiasedness property as follows:

Consider
⎡ 1 n
⎤ ⎡ 1 ⎛ n ˆ 2 ⎞⎤
E⎢ ∑ ( yi − Yˆc ) 2 ⎥ = E ⎢ ⎜ ∑ ( yi − nYc ) ⎟ ⎥
2
⎣ n ( n − 1) i =1 ⎦ ⎣ n ( n − 1) ⎝ i =1 ⎠⎦
1 ⎡ ⎛ n
2⎞ ˆ 2⎤
= ⎢ E ⎜ ∑ α i yi ⎟ − nVar (Yc ) − nY ⎥
n(n − 1) ⎣ ⎝ i =1 ⎠ ⎦
where E (α i ) = nPi , Var (α i ) = nPi (1 − Pi ), Cov (α i , α j ) = − nPP

i j ,i ≠ j
⎡ 1 n
⎤ 1 ⎡N 1 N ⎤
E⎢ ∑ ( yi − Yˆc ) 2 ⎥ = ⎢ ∑ ni Pi yi2 − n ∑ Pi ( yi − Y ) 2 − nY 2 ⎥
⎣ n ( n − 1) i =1 ⎦ n ( n − 1) ⎣ i =1 n i =1 ⎦
1 ⎡N 1 N
⎤
= ∑ Pi ( yi2 − Y 2) − n ∑
(n − 1) ⎢⎣ i =1 i =1
Pi ( yi − Y ) 2 ⎥
⎦
1 ⎡N 1 N ⎤
= ⎢ ∑
(n − 1) ⎣ i =1
Pi ( yi − Y ) 2 − ∑ Pi ( yi − Y ) 2 ⎥
n i =1 ⎦
1 N
= ∑ Pi ( yi − Y )2
(n − 1) i =1
= Var (Yˆ ).
c
11
Sampling Theory
MODULE X
LECTURE - 33
TWO STAGE SAMPLING
(SUB SAMPLING)
DR SHALABH
DR.
1
In cluster sampling, all the elements in the selected clusters are surveyed. Moreover, the efficiency in cluster
sampling depends on size of the cluster. As the size increases, the efficiency decreases. It suggests that
hi h precision
higher i i can be
b attained
tt i d by
b distributing
di t ib ti a given
i number
b off elements
l t over a large
l number
b off clusters
l t and
d
then by taking a small number of clusters and enumerating all elements within them. This is achieved in
subsampling.
I subsampling
In b li
Divide the population into clusters.
Select a sample of clusters [first stage]
F
From each
h off th
the selected
l t d cluster,
l t select
l t a sample
l off specified
ifi d number
b off elements
l t [[second
d stage]
t ]
The clusters which form the units of sampling at the first stage are called the first stage units and the units or
group of units within clusters which form the unit of clusters are called the second stage units or subunits.
Th procedure
The d iis generalized
li d tto th
three or more stages
t and
d iis th
then ttermed
d as multistage
lti t sampling.
li
For example, in a crop survey
villages are the first stage units,
fields within the villages are the second stage units and
plots within the fields are the third stage units.
In another example, to obtain a sample of fishes from a commercial fishery,
first take a sample of boats and
then take a sample of fishes from each selected boat.

2
Two stage sampling with equal first stage units
A
Assume th t
that
population consists of NM elements.
NM elements are grouped into N first stage units of M second stage units each, (i.e., N clusters, each
cluster is of size M).
Sample of n first stage units is selected (i.e., choose n clusters)
Sample of m second stage units is selected from each selected first stage unit (i.e., choose m units
from each cluster).
Units at each stage are selected with SRSWOR
SRSWOR.
Cluster sampling is a special case of two stage sampling in the sense that from a population of N clusters of
equal size m = M, a sample of n clusters chosen.
If further
f h M = m = 1,
1 we get SRSWOR.
SRSWOR
If n = N, we have the case of stratified sampling. 3

yij : Value of the characteristic under study for the jth second stage unit of the ith first stage unit;
i = 1, 2,..., N ; j = 1, 2,.., m.
m
1
Yi =
M
∑y
j =1
ij : mean per 2nd stage unit of ith 1st stage units in the population.
N M N
1 1
Y =
MN
∑∑ y
i =1 j =1
ij =
N
∑y
i =1
i = YMN : mean per second
d stage
t unitit in
i the
th population
l ti
1 m
yi = ∑ yij : mean per second stage unit in the ith first stage unit in the sample.
n j =1
1 n m 1 n
y= ∑∑
mn i =1 j =1
yij = ∑ yi = ymn : mean per second stage in the sample.
n i =1
Advantages
The principle advantage of two stage sampling is that it is more flexible than the one stage sampling. It reduces to
one stage sampling when m = M but unless this is the best choice of m, we have the opportunity of taking some
smaller value that appears more efficient.
efficient As usual,
usual this choice reduces to a balance between statistical precision
and cost. When units of the first stage agree very closely, then consideration of precision suggests a small value of
m. On the other hand, it is sometimes as cheap to measure the whole of a unit as to a sample. For example, when
the unit is a household and a single respondent can give as accurate data as all the members of the household.
4
A pictorial scheme of two stage sampling scheme is as follows:
Population (MN units)
Cluster Cluster Cluster

Cluster Population
Population
… … … N clusters
M units M units M units
N clusters
…
Cluster Cluster Cluster First stage

sample
M units M units M units n clusters (small)
n clusters (small)
Second stage
sample
Cluster
Cluster Cluster
Cluster Cluster
Cluster
… … … m units
m m m units n clusters (small)
units units
mn units
5
Note The expectations under two stage sampling scheme depend on the stages. For example, the expectation
at second stage unit will be dependent on first stage unit in the sense that second stage unit will be in the
sample provided it was selected in the first stage.
To calculate the average
First average the estimator over all the second stage selections that can be drawn from a fixed set of n units
that the plan selects.
Then average over all the possible selections of n units by the plan.
In case of
o two
t o stage sa
sampling,
p g,
E (θˆ) = E1[ E2 (θˆ)]
↓ ↓
average average average over all
over over all possible 2nd stage
all 1st stage selections from a
samples samples fixed set of units
In case of three stage

g sampling,
p g,
{ }
E (θˆ) = E1 ⎡ E2 E3 (θˆ) ⎤ .
⎣ ⎦
To calculate the variance, we proceed as follows:
In case of two stage sampling,
Var (θˆ) = E (θˆ − θ ) 2
= E E (θˆ − θ ) 2 .
1 2
6
Consider
E2 (θˆ − θ ) 2 = E2 (θˆ 2 ) − 2θ E2 (θˆ) + θ 2
{
= ⎡ E2 (θˆ) } + V2 (θˆ) ⎤ − 2θ E2 (θˆ) + θ 2 .
2
⎢⎣ ⎥⎦
Now average over first stage selection as

2
E1 E2 (θˆ − θ ) 2 = E1 ⎡⎣ E2 (θˆ) ⎤⎦ + E1 ⎡⎣V2 (θˆ) ⎤⎦ − 2θ E1 E2 (θˆ) + E1 (θ 2 )
{ }
= E1 ⎡ E1 E2 (θˆ) − θ 2 ⎤ + E1 ⎡⎣V2 (θˆ) ⎤⎦
2
⎣⎢ ⎦⎥
Var (θˆ) = V1 ⎡⎣ E2 (θˆ) ⎤⎦ + E1 ⎡⎣V2 (θˆ) ⎤⎦ .
In case of three stage sampling,
Var (θˆ) = V1 ⎡ E
⎣ 2
{E (θˆ)}⎤⎦ + E ⎡⎣V {E (θˆ)}⎤⎦ + E ⎡⎣ E {V (θˆ)}⎤⎦ .
3 1 2 3 1 2 3
7
Consider y = ymn as an estimator of the population mean Y .
Bias
Consider
E ( y ) = E1 [ E2 ( ymn ) ]
= E1 [ E2 ( yim | i ) ] (as 2nd stage is dependent on 1st stage)

= E1 [ E2 ( yim | i ) ] (as yi is unbiased for Yi due to SRSWOR)
⎡1 ⎤ n
= E1 ⎢ ∑ Yi ⎥
n
⎣ i =1 ⎦
N
1
=
N
∑Y
i =1
i
=Y .
Thus ymn is an unbiased estimator of the population mean. 8

Variance
Var ( y ) = E1 [V2 ( y | i ) ] + V1 [ E2 ( y | i ) ]
⎡ ⎧1 n ⎫⎤ ⎡ ⎧1 n ⎫⎤
= E1 ⎢V2 ⎨ ∑ yi | i ⎬⎥ + V1 ⎢ E2 ⎨ ∑ yi | i ⎬⎥
⎣ ⎩ n i =1 ⎭⎦ ⎣ ⎩ n i =1 ⎭⎦
⎡1 n
⎤ ⎡1 n
⎤
= E1 ⎢ 2
⎣n
∑V ( y | i) ⎥⎦ + V ⎢⎣ n ∑ E ( y | i) ⎥⎦
i =1
i 1
i =1
2 i
⎡1 n
⎛1 1 ⎞ ⎤ ⎡1 n ⎤
= E1 ⎢ 2
⎣n
∑ ⎜⎝ m − M ⎟⎠ S
i =1
i
2
⎥ + V1 ⎢ n ∑ Yi ⎥
⎦ ⎣ i =1 ⎦
1 n
⎛1 1 ⎞
=
n2
∑ ⎜⎝ m − M ⎟⎠E ( S
i =1
i
2
) | i + V1 ( yc )
( where yc is based on cluster means as in cluster sampling)

1 ⎛1 1 ⎞ 2 N −n 2
= n⎜ − ⎟ Sw + Sb
n2 ⎝m M ⎠ N
Nn
1⎛ 1 1 ⎞ 2 ⎛1 1 ⎞ 2
= ⎜ − ⎟ S w + ⎜ − ⎟ Sb
n⎝m M ⎠ ⎝n N ⎠
N N M 2
1 1
where S w2 =
N
∑ Si2 = ∑∑
N ( M − 1) i =1 j =1
(Yij − Yi )
i =1
1 N
Sb2 = ∑ (Yi − Y )2 .
N − 1 i =1
9
An unbiased estimator of variance of y can be obtained by g S b2 and S w2 by

y replacing
p y their unbiased estimators in the
expression of variance of y
Consider an estimator of
N
1
S w2 =
N
∑S
i =1
i
2
where
2
1 M
Si2 = ∑ ( yij − Yi )
M − 1 j =1
and
1 n 2
sw2 = ∑ si
n i =1
1 m
si2 = ∑ ( yij − yi )2 .
m − 1 j =1
10
So E ( sw2 ) = E1 E2 ( sw2 | i )
⎡1 n ⎤
= E1 E2 ⎢ ∑ si2 | i ⎥
n
⎣ i =1 ⎦
1 n
= E1 ∑ ⎡ E2 (si2 | i) ⎤⎦
n i =1 ⎣
1 n 2
= E1 ∑ Si
n i =1
(as SRSWOR is used)
1 n
= ∑ E1 (Si2 )
n i =1
1 N
⎡1 N
⎤
=
N
∑ ⎢⎣ N ∑ S
i =1 i =1
i
2
⎥
⎦
N
1
=
N
∑S
i =1
i
2
= S w2
2
so sw2 is an unbiased estimator of Sw .
Consider
1 n
sb2 = ∑ ( yi − y )2
n − 1 i =1
as an estimator of
1 N
Sb2 = ∑ (Yi − Y )2 .
N − 1 i =1
11
So
1 ⎡ n ⎤
E ( sb2 ) = E ⎢ ∑ ( yi − y ) 2 ⎥
n − 1 ⎣ i =1 ⎦
⎡ n ⎤
(n − 1) E ( sb2 ) = E ⎢ ∑ yi2 − ny 2 ⎥
⎣ i =1 ⎦
⎡ n ⎤
= E ⎢ ∑ yi2 ⎥ − nE ( y 2 )
⎣ i =1 ⎦
⎡ ⎛ n ⎞⎤
= E1 ⎢ E2 ⎜ ∑ yi2 ⎟ ⎥ − n ⎡Var ( y ) + { E ( y )} ⎤
2
⎣ ⎝ i =1 ⎠ ⎦ ⎣ ⎦
⎡ n ⎤ ⎡⎛ 1 1 ⎞ ⎛1 1 ⎞1 2 2⎤
= E1 ⎢ ∑ E2 ( yi2 ) | i ) ⎥ − n ⎢⎜ − ⎟ Sb2 + ⎜ − ⎟ Sw + Y ⎥
⎣ i =1 ⎦ ⎣⎝ n N ⎠ ⎝m M ⎠n ⎦
⎡ n
{ 2 ⎤
}
⎡⎛ 1 1 ⎞ ⎛1 1
= E1 ⎢ ∑ Var ( yi ) + ( E ( yi ) ) ⎥ − n ⎢⎜ − ⎟ Sb2 + ⎜ −
⎣ i =1 ⎦ ⎣⎝ n N ⎠ ⎝m M
⎞1 2 2⎤
⎟ Sw + Y ⎥
⎠n ⎦
⎡ n ⎧⎛ 1 1 ⎞ 2 2 ⎫⎤ ⎡⎛ 1 1 ⎞ 2 ⎛ 1 1 ⎞ 1 2 2⎤
= E1 ⎢ ∑ ⎨⎜ − ⎟ Si + Yi ⎬⎥ − n ⎢⎜ − ⎟ Sb + ⎜ − ⎟ S w + Y ⎥
⎣ i =1 ⎩⎝ m M ⎠ ⎭⎦ ⎣⎝ n N ⎠ ⎝m M ⎠n ⎦
⎡1 ⎧ n ⎛ 1 1 ⎞ 2 2 ⎫⎤ ⎡⎛ 1 1 ⎞ 2 ⎛ 1 1 ⎞1 2 2⎤
= nE1 ⎢ ⎨∑ ⎜ − ⎟Si + Yi ⎬⎥ − n ⎢⎜ − ⎟ Sb + ⎜ − ⎟ Sw + Y ⎥
⎣ n ⎩ i =1 ⎝ m M ⎠ ⎭⎦ ⎣⎝ n N ⎠ ⎝m M ⎠ n ⎦
⎡⎛ 1 1 ⎞1 N
1 N
⎤ ⎡⎛ 1 1 ⎞ 2 ⎛ 1 1 ⎞1 2 2⎤
= n ⎢⎜ −
⎣⎝ m M
⎟
⎠N
∑S
i =1
i
2
+
N
∑Y
i =1
i
2
⎥ − n ⎢⎜ n − N ⎟ Sb + ⎜ m − M
⎦ ⎣⎝ ⎠ ⎝
⎟ Sw + Y ⎥
⎠ n ⎦
⎡⎛ 1 1 ⎞ 1 N
⎤ ⎡⎛ 1 1 ⎞ 2 ⎛ 1 1 ⎞1 2 2⎤
= n ⎢⎜ − ⎟ S w2 +
⎣⎝ m M ⎠ N
∑Y i =1
i
2
⎥ − n ⎢⎜ n − N ⎟ S b + ⎜ m − M
⎦ ⎣⎝ ⎠ ⎝
⎟ Sw + Y ⎥
⎠ n ⎦
12
N
⎛1 1 ⎞ n ⎛1 1 ⎞
= ( n − 1) ⎜ − ⎟ S w2 +
⎝m M ⎠ N
∑Y
i =1
i
2
− nY 2 − n ⎜ − ⎟ Sb2
⎝n N ⎠
⎛1 1 ⎞ n ⎡N ⎤ ⎛1 1 ⎞
= ( n − 1) ⎜ − ⎟ S w2 + ⎢ ∑ Yi 2 − NY 2 ⎥ − n ⎜ − ⎟ Sb2
⎝ m M ⎠ N ⎣ i =1 ⎦ ⎝n N ⎠
⎛1 1 ⎞ n ⎛1 1 ⎞
= ( n − 1) ⎜ − ⎟ S w2 + ( N − 1) Sb2 − n ⎜ − ⎟ Sb2
⎝m M ⎠ N ⎝n N ⎠
⎛1 1 ⎞
= ( n − 1) ⎜ − ⎟ S w2 + ( n − 1) Sb2 .
⎝m M ⎠
⎛1 1 ⎞
⇒ E ( sb2 ) = ⎜ − ⎟ S w2 + Sb2
⎝m M ⎠
or
⎡ ⎛1 1 ⎞ 2⎤
E ⎢ sb2 − ⎜ − ⎟ sw ⎥ = S b .
2
⎣ ⎝m M ⎠ ⎦
Thus
1 ⎛ 1 1 ⎞ ˆ 2 ⎛ 1 1 ⎞ ˆ2
Var ( y ) = ⎜ − ⎟ Sω + ⎜ − ⎟ S b
n⎝m M ⎠ ⎝n N ⎠
1⎛ 1 1 ⎞ 2 ⎛ 1 1 ⎞⎡ 2 ⎛ 1 1 ⎞ 2⎤
= ⎜ − ⎟ sw + ⎜ − ⎟ ⎢ sb − ⎜ − ⎟ sw ⎥
n⎝m M ⎠ ⎝ n N ⎠⎣ ⎝m M ⎠ ⎦
1⎛1 1 ⎞ 2 ⎛1 1 ⎞ 2
= ⎜ − ⎟ sw + ⎜ − ⎟ sb .
N ⎝m M ⎠ ⎝n N ⎠
13
Sampling Theory
MODULE X
LECTURE - 34
TWO STAGE SAMPLING
(SUB SAMPLING)
DR. SHALABH
1
Allocation of sample to the two stages: Equal first stage units
The variance of sample mean in the case of two stage sampling is
 ( y )  1  1  1  S 2   1  1  S 2.
Var   w   b
nm M  n N 
It depends on Sb2 , S w2 , n and m. So the cost of survey of units in the two stage sample depends on n and m.
Case 1. When cost is fixed
We find the values of n and m so that the variance is minimum for given cost.
(I) When cost function is C = kmn

Let the cost of survey be proportional to sample size as
C = knm
where C is the total cost and k is constant.
C0
When cost is fixed as C  C0 . Substituting m  in Var ( y ), we get
kn
1  2 S w2  Sb2 1 kn 2
Var ( y )  Sb     Sw
n  M  N n C0
1  2 S w2   Sb2 kS w2 
 Sb    .
n  M   N C0 
 S2 
This variance is monotonic decreasing function of n if  Sb2  w   0. The variance is minimum when n assumes
C  M 
maximum value, i.e., nˆ  0 corresponding to m  1.
k
 2 S2 
If  Sb  w   0 (i.e., intraclass correlation is negative for large N ), then the variance is a monotonic increasing
 M  C
function of n, It reaches minimum when n assumes the minimum value, i.e., nˆ  0 (i.e., no subsampling).
km
2
(II) When cost function is C  k1n  k2 mn
Let cost C be fixed as C0  k1n  k 2 mn where k1 and k2 are positive constants. The terms k1 and k2 denote the
costs of per unit observations in the first and second stages. Minimize the variance of sample mean under the two
stage with respect to m subject to the restriction C0  k1n  k2 mn.
We have
 S2   S2   S2  k S2
C0 Var ( y )  b   k1  Sb2  w   k2 S w2  mk2  Sb2  w   1 w .
 N  M  M m
 S w2 
When  Sb2    0, then
 M
2 2
 S2    S2     S2  k S2 
C0 Var ( y )  b    k1  Sb2  w   k2 S w2    mk2  Sb2  w   1 w 
N  M M m 
      
which is minimum when the second term of right hand side is zero. So we obtain
k1 S w2
mˆ  .
k2  2 S w2 
S 
 b M
 
C0
The optimum n follows from C0  k1n  k2 mn as nˆ  .
k1  k2 mˆ
3
 S2 
When  Sb2  w   0 then
 M
 S2   S2   S2  k S2
C0 Var ( y )  b   k1  Sb2  w   k2 S w2  mk2  Sb2  w   1 w .
 N  M  M m
is minimum if m is the greatest attainable integer.
Hence in this case, when

C0
C0  k1  k2 M ; mˆ  M and nˆ  .
k1  k2 M
C0  k1
If C0  k1  k2 M ; then m
ˆ and nˆ  1.
k2
If N is large, then
S w2  S 2 (1   )
S w2
Sb2   S2
M
k1  1 
mˆ    1.
k2    4
Case 2: When variance is fixed
Now we find the sample sizes when variance is fixed, say as V0.
1 1 1  2 1 1  2
V0     S w     Sb
nm M  n N 
1 1 
Sb2     S w2
n m M  .
S2
V0  b
N
So  2 S w2 
 Sb   kS w2
C  kmn  km  M  .
2 2
 V  Sb  V  Sb
 0  0
 N  N
 S w2 
If  Sb2    0, C attains minimum when m assumes the smallest integral value, i.e., 1.
 M
 2

If S 2  S w  0, C attains minimum when m ˆ  M.
 b 
 M
5
Comparison of two stage sampling with one stage sampling
One stage sampling procedures are comparable with two stage sampling procedures when either
(i) sampling mn elements in one single stage or

mn
(ii) sampling first stage units as cluster without sub-sampling.
M
We consider both the cases.
Case 1: Sampling mn elements in one single stage

The variance of sample mean based on
- mn elements selected by SRSWOR (one stage) is given by
 1 1  2
V ( ySRS )    S .
 mn MN 
- two stage sampling is given by
1 1 1  2 1 1  2
V ( yTS )     S w     Sb .
nm M  n N 
The intraclass correlation coefficient is
M ( N  1) Sb2  NS w2 1
 ;   1
( MN  1) S 2 M 1
6
and using the identity
N M N M N M
 ( y
i 1 j 1
ij  Y ) 2   ( yij  Yi ) 2   (Yi  Y ) 2
i 1 j 1 i 1 j 1
N M M
1 1
where Y 
MN
 y , Y  M  y .
i 1 j 1
ij i
j 1
ij
We have ( MN  1) S 2   N ( M  1) S w2  M ( N  1) Sb2
and
( MN  1) S 2   NS w2  M ( N  1) Sb2
( MN  1) S 2
 Sb2  1  ( M  1)   (Eliminating S w2 )
M 2 ( N  1)
 MN  1  2
S w2    S (1   ).
 MN 
Substituting Sb2 and S w2 in Var ( yTS )
 MN  1  S  m(n  1) N n m M  m 
2
V ( yTS )    1   ( M  1)   .
 MN  mn  M ( N  1)  N 1 M M  
7
m
When subsampling rate is small, MN  1  MN and M  1  M , then
M
S2
V ( ySRS ) 
mn
S2   N n 
V ( yTS )  1   m  1  .
mn   N 1 
The relative efficiency of the two stage in relation to one stage sampling of SRSWOR is
Var ( yTS )  N n 
RE   1   m  1 .
Var ( ySRS )  N 1 
If N  1  N and finite population correction is ignorable,
N n N n
then   1,
N 1 N
and then
RE  1   (m  1).
8
Case 2: Comparison with cluster sampling
mn
Suppose a random sample of M clusters, without further subsampling is selected.
The variance of the sample mean of equivalent mn / M clusters is

M 1 2
Var (.ycl )     Sb .
 mn N 
The variance of sample mean under the two stage sampling is

1 1 1  2 1 1  2
Var ( yTS )     S w     Sb .
nm M  n N 
So Var ( ycl ) exceedes Var ( yTS ) by
1M  2 1 2 
  1 Sb  S w 
n m  M 
which is approximately
1M  2  2 S w2 
  1  S for large N and  Sb   0
n m   M
MN  1 S 2
where Sb2  1   ( M  1)
M ( N  1) M
MN  1 2
S w2  S (1   ).
MN
So smaller the m /M, larger the reduction in the variance of two stage sample over a cluster sample.
 S w2 
When  Sb2    0 then the subsampling will lead to loss in precision.
9
 M
Sampling Theory
MODULE X
LECTURE - 35
TWO STAGE SAMPLING
(SUB SAMPLING)
DR. SHALABH
1
Two stage sampling with unequal first stage units:
Consider two stage sampling when the first stage units are of unequal size and SRSWOR is employed at each stage.
Let yij : value of jth second stage unit of the ith first stage unit.
M i : number of second stage units in ith first stage unit.

N
M 0 = ∑ M i : total number of second stage units in the population.
i =1
mi : number of second stage units to be selected from ith first stage units, if it is in the sample.
n
m0 = ∑ mi : total number of second stage units in the sample.
i =1
mi
1
yi ( mi ) =
mi
∑y
j =1
ij
Mi
1
Yi =
Mi
∑y
j =1
ij
N
1
Y =
N
∑y
i =1
i = YN
N Mi N
∑∑ y ij ∑M Y i i
1 N
Y = i =1 j =1
N
= i =1
MN
=
N
∑u Y i i
∑ Mi
i =1
i =1
Mi
ui =
M
N
1
M= ∑M i
2
N i =1
The pictorial scheme of two stage sampling with unequal first stage units case is as follows:
Population (MN units)
Cluster Cluster Cluster Population

… … … N clusters
M1 M2 MN
units units units
N clusters
…
Cluster Cluster Cluster First stage

sample
M1 M2 Mn n clusters
units units units (small)
n clusters
(small)
Cluster Cluster Cluster Second stage

m1 m2 … … … mn sample
units units units n clusters
(small)
3
Now we consider different estimators for estimation of population mean.
1. Estimator based on first stage unit means in the sample:

1 n
Yˆ = yS 2 = ∑ yi ( mi )
n i =1
Bias
1 n 
E ( yS 2 ) = E  ∑ yi ( mi ) 
n
 i =1 
1 n 
= E1  ∑ E2 ( yi ( mi ) ) 
 n i =1 
1 n 
= E1  ∑ Yi  [Since a sample of size mi is selected out of M i units by SRSWOR]
 n i =1 
N
1
=
N
∑Y
i =1
i
=YN
≠ Y.
So yS 2 is a biased estimator of Y and its bias is given by
Bias ( yS 2 ) = E ( yS 2 ) − Y
N N
1 1
=
N
∑ Y − NM ∑ M Y
i =1
i
i =1
i i
1 N 1  N  N 
=−  ∑
NM  i =1
M Y
i i −  ∑ Yi  ∑ M i  
N  i =1  i =1 
N
1
= ∑ (M i − M )(Yi − Y N ).
4
NM i =1
This bias can be estimated by
N −1 n
(y ) = −
Bias S2 ∑
NM (n − 1) i =1
( M i − m)( yi ( mi ) − yS 2 )
which can be seen as follows:
 ( y ) = − N −1 E  1 
n
E  Bias 1 ∑ E2 {( M i − m)( yi ( mi ) − yS 2 ) / n}
 S2 
NM  n − 1 i =1 
N −1  1 n 
=− E ∑
NM  n − 1 i =1
( M i − m)(Yi − y n ) 

N
1
=−
NM
∑ (M
i =1
i − M )(Yi − Y N )
= Y N −Y
1 n
where y n = ∑ Yi .
n i =1
An unbiased estimator of population mean Y is thus obtained as
N −1 1 n
yS 2 + ∑ (M i − m)( yi ( mi ) − yS 2 ).
NM N − 1 i =1
Note that the bias arises due to inequality of sizes of the first stage units and probability of selection of second stage
units varies from one first stage to another. 5
Variance:
Var ( yS 2 ) = E [Var ( yS 2 | n) ] + Var [ E ( yS 2 | n) ]
1 n  1 n

= Var  ∑ yi  + E  2 ∑Var ( y i ( mi ) | i)
 n i =1  n i =1 
1 1  1 n
 1 1  2
=  −  Sb2 + E  2
n N 
∑ m −  Si 
n i =1  i Mi  
1 1  1 N  1 1  2
=  −  Sb2 +
n N 
∑  −  Si
Nn i =1  mi M i 
where
( )
2
1 N
Sb2 = ∑ Yi − Y N
N − 1 i =1
2
1 Mi
Si2 = ∑ ( yij − Yi ) .
M i − 1 j =1
The MSE can be obtained as
MSE ( y S 2 ) = Var ( y S 2 ) + [ Bias( y S 2 )] .

2
6
Estimation of variance:
Consider mean square between cluster means in the sample
2
1 n
sb2 = ∑ ( yi ( mi ) − yS 2 ) .
n − 1 i =1
It can be shown that
1 N
 1 1  2
E ( sb2 ) = Sb2 +
N
∑ m −
Mi
Si
i =1  i 
mi
1
Also si2 = ∑
mi − 1 j =1
( yij − yi ( mi ) ) 2
1 Mi
E ( si2 ) = Si2 = ∑ ( yij − Yi )2
M i − 1 j =1
1 n  1 1  2 1 N
 1 1  2
So E  ∑  −  si  = ∑ m −  Si .
 n i =1  mi M i   N i =1  i Mi 
Thus
1 n  1 1  2
E ( sb2 ) = Sb2 + E  ∑  −  si 
 n i =1  mi M i  
and an unbiased estimator of Sb2 is
1 n  1 1  2
Sˆb2 = sb2 − ∑  −  si .
n i =1  mi M i 
So an estimator of variance can be obtained by replacing Sb2 and Si2 by their unbiased estimators as
 ( y ) =  1 − 1  Sˆ 2 + 1
N
 1 1  ˆ2
Var S2 
n N 
 b ∑  −  Si
Nn i =1  mi M i 
7
2. Estimation based on first stage unit totals:
1 n M i yi ( mi )
Yˆ = yS* 2 = ∑
n i =1 M
1 n
= ∑ ui yi ( mi )
n i =1
Mi
where ui = .
M
Bias
1 n 
E ( yS* 2 ) = E  ∑ ui yi ( mi ) 
 n i =1 
1 n 
= E  ∑ ui E2 ( yi ( mi ) | i ) 
 n i =1 
1 n 
= E  ∑ uiYi 
 n i =1 
N
1
=
N
∑u Y
i =1
i i
=Y.
Thus yS* 2 is an unbiased estimator of Y .

8
Variance:
Var ( yS* 2 ) = Var  E ( yS* 2 | n)  + E Var ( yS* 2 | n) 
1 n  1 n

= Var  ∑ uiYi  + E  2 ∑ u Var ( y
2
i i ( mi ) ) | i
n
 i =1  n i =1 
1 1  1 N
 1 1  2
=  −  Sb*2 +
 n N  nN
∑u 2
i  −
m M
 Si
i =1  i i 
where
1 Mi
Si2 = ∑ ( yij − Yi )2
M i − 1 j =1
1 N
Sb*2 = ∑ (uiYi − Y )2 .
N − 1 j =1
9
3. Estimator based on ratio estimator:
n n
∑ M i yi ( mi ) ∑u y i i ( mi )
yS* 2
Yˆ = yS**2 = i =1
n
= i =1
n
=
un
∑Mi =1
i ∑ui =1
i
Mi 1 n
where ui = , un = ∑ ui .
M n i =1
This estimator can be seen as if arising by ratio method of estimation as follows:
Let yi* = ui yi ( mi )
Mi
xi* = , i = 1, 2,..., N
M
be the values of study variable and auxiliary variable in reference to ratio method of estimation. Then
1 n *
y* = ∑ yi = yS*2
n i =1
1 n *
x* = ∑ xi = un
n i =1
N
1
X* =
N
∑X
i =1
*
i = 1.
The corresponding ratio estimator of Y is

y* y*
YˆR = X * = S 2 1 = yS**2 .
10
x* un
So the bias and mean squared error of yS**2 can be obtained directly from the results of ratio estimator.
Recall that in ratio method of estimation, the bias of ratio estimator upto second order of approximation is
N −n
Bias ( yˆ R ) ≈ Y (C x2 − 2 ρ C xC y )
Nn
Var ( x ) Cov( x , y ) 
=Y  2
− 
 X XY
MSE (YˆR ) ≈ Var ( y ) + R 2Var ( x ) − 2 RCov( x , y ) 
Y
where R= .
X
Bias :
**
The bias of yS 2 up to second order of approximation is
Var ( xS*2 ) Cov( xS*2 , yS* 2 ) 

Bias ( yS**2 ) = Y  2
− 
 X XY 
1 n
*
where xS 2 is the mean of auxiliary variable similar to yS 2 as
**
xS*2 = ∑ xi ( mi ) .
n i =1
* *
Now we find Cov( xS 2 , yS 2 ).
 1 n 1 n   1 n 1 n 
Cov( xS*2 , yS* 2 ) = Cov  E  ∑ ui xi ( mi ) , ∑ ui yi ( mi )   + E Cov  ∑ ui xi ( mi ) , ∑ ui yi ( mi )  
n
  i =1 n i =1   n
 i =1 n i =1 
1 n 1 n  1 n

= Cov  ∑ ui E ( xi ( mi ) ), ∑ ui E ( yi ( mi ) )  + E  2 ∑ u Cov( x
2
i i ( mi ) , yi ( mi ) ) | i 
 n i =1 n i =1  n i =1 
1 n 1 n  1 n
 1 1  
= Cov  ∑ ui X i , ∑ uiYi  + E  2 ∑u 2
i  − Sixy 
 n i =1 n i =1  n i =1  mi M i  
1 1  * 1 N
 1 1 
=  −  Sbxy + ∑u 2
 − Sixy
11
i
n N  nN i =1  mi M i 
where
1 N
*
Sbxy = ∑ (ui X i − X )(uiYi − Y )
N − 1 i =1
1 Mi
Sixy = ∑ ( xij − X i )( yij − Yi ).
M i − 1 j =1
* *
Similarly, Var ( xS*2 ) can be obtained by replacing x in place of y in Cov( xS 2 , yS 2 ) as
1 1  1 N
 1 1  2
Var ( xS*2 ) =  −  Sbx*2 +
n N  nN
∑u 2
i  − Six
i =1  mi M i 
1 N
where Sbx*2 = ∑ (ui X i − X )2
N − 1 i =1
1 Mi
Six*2 = ∑ ( xij − X i )2 .
M i − 1 i =1
* * * **
Substituting Cov( xS 2 , yS 2 ) and Var ( xS 2 ) in Bias ( yS 2 ), we obtain the approximate bias as
 1 1   Sbx*2 Sbxy
*
 1 N
  1 1   Six2 Sixy  
Bias ( y ) ≈ Y  −   2 −
**
S2  +
XY  nN
∑ u 2
i  −  2 −   .
 n N   X i =1   mi M i   X XY  
12
Mean squared error
MSE ( yS**2 ) ≈ Var ( yS* 2 ) − 2 R*Cov( xS*2 , yS* 2 ) + R*2Var ( xS*2 )
1 1  1 N
 1 1  2
Var ( yS**2 ) =  −  Sby*2 +
n N  nN
∑u 2
i  −  Siy
i =1  mi M i 
1 1  1 N
 1 1  2
Var ( xS**2 ) =  −  Sbx*2 +
n N  nN
∑u 2
i  −  Six
i =1  mi M i 
 1 1  *2 1 N
 1 1  2
Cov( xS*2 , yS**2 ) =  −  Sbxy
n N 
+
nN
∑u 2
i  −  Sixy
i =1  mi M i 
where
1 N
Sby*2 = ∑ (uiYi − Y )2
N − 1 i =1
1 Mi
Siy*2 = ∑ ( yij − Yi )2
M i − 1 j =1
Y
R* = =Y.
X
Thus
1 1  1 N   1 1  2 
MSE ( yS**2 ) ≈  −  ( Sby*2 − 2 R* Sbxy
*
+ R*2 Sbx*2 ) + ∑ u 2
i  −  ( Siy − 2 R Sixy + R Six ) .
* *2 2
n N  nN i =1   mi M i  
Also
1 1  1 N 2 1 N   1 1  2 
∑ ui (Yi − R* X i ) + ∑ u  ( Siy − 2 R Sixy + R Six ) .
2
MSE ( yS**2 ) ≈  −  2
i  −
* *2 2
 n N  N − 1 i =1 nN i =1   mi M i  
13
Consider
1 n
*
sbxy = ∑ ( ui yi ( mi ) − yS*2 )( ui xi ( mi ) − xS*2 )
n − 1 i =1 
1 n
sixy = ∑ ( xij − xi ( mi ) )( yij − yi ( mi ) ).
mi − 1 j =1 
It can be shown that
1 N
 1 1 
E ( sbxy
*
) = Sbxy* + ∑u 2
i  − Sixy
N i =1  mi M i 
E ( sixy ) = Sixy .
So
1 n  1 1   1 N   1 1  
E  ∑ ui2  −  sixy  = ∑ u 2
i  −  Sixy .
 n i =1  mi M i   N i =1   mi M i  
Thus
1 n  1 1 
Sˆbxy
*
= sbxy
*
− ∑ ui2  −  sixy
n i =1  mi M i 
1 n  1 1  2
Sˆbx*2 = sbx
*2
− ∑ ui2  −  six
n i =1  mi M i 
1 n  1 1  2
Sˆby*2 = sby
*2
− ∑ ui2  −  siy .
n i =1  mi M i 
14
Also
 1 n   1 1  2  1 N   1 1  2
E  ∑ ui2  −  six  = ∑ u 2
i  −  Six 
 n i =1   mi M i   N i =1   mi M i  
 1 n   1 1  2  1 N   1 1  2
E  ∑ ui2  −  siy  = ∑ u 2
i  −  Siy .
 n i =1   mi M i   N i =1   mi M i  
A consistent estimator of MSE of yS**2 can be obtained by substituting the unbiased estimators of respective statistics
in MSE ( yS**2 ) as
 ( y ** ) ≈  1 − 1  ( s*2 − 2r * s* + r *2 s*2 ) + 1
n
 1 1  2
∑u  −  ( siy − 2r sixy + r six )
2 * *2 2
MSE S2   by bxy bx i
n N  nN i =1  mi M i 
1 1  1 n 1 n   1 1  2 
∑ ( yi ( mi ) − r * xi ( mi ) ) + ∑ u  ( siy − 2r sixy + r six ) 
2
≈ −  2
i  −
* *2 2
 n N  n − 1 i =1 nN i =1   mi M i  
where
yS* 2
r* = .
xS*2
15
Sampling Theory
MODULE XI
LECTURE - 36
SYSTEMATIC SAMPLING
DR. SHALABH
1
Systematic sampling technique is operationally more convenient than simple random sampling. It also ensures
at the same time that each unit has equal probability of inclusion in the sample. In this method of sampling, the
first unit is selected with the help of random numbers and the remaining units are selected automatically
according to a predetermined pattern. This method is known as systematic sampling.
Suppose the units in the population are numbered 1 to N in some order. Suppose further that N is expressible
as a product of two integers n and k, so that N = nk.
To draw a sample of size n,
 select a random number between 1 and k.
 Suppose it is i.
 Select the first unit whose serial number is i.
 Select every kth unit after ith unit.

.
 Sample will contain i, i + k ,1 + 2k ,..., i + (n − 1)k serial number units.
So first unit is selected at random and other units are selected systematically. This systematic sample is called
kth systematic sample and k is termed as sampling interval. This is also known as linear systematic
sampling.
2
The observations in systematic sampling are arranged as in the following table:
Systematic 1 2 3 ... i ... k

Sample
Number
Sample 1 y1 y2 y3 ... yi ... yk

composition
2 yk +1 yk + 2 yk + 3 ... yk + i ... y2k
. . . . . . . .
. . . . . . . .
. . . . . . . .
n y( n −1) k +1 y( n −1) k + 2 y( n −1) k +3 ... y( n −1) k +i ... ynk
Probability 1 1 1 1 1
... ...
k k k k k
Sample mean
y1 y2 y3 ... yi ... yk
Example: Let N = 50 and n = 5. So k = 10. Suppose first selected number between 1 and 10 is 3. Then
systematic sample consists of units with following serial number 3, 13, 23, 33, 43.
3
Systematic Sampling is two dimensions:
Assume that the units in a population are arranged in the form of ml rows and each row contains nk units. A sample of
size mn is required. Then
 select a pair of random numbers (i, j) such that i ≤  and j ≤ k .

th
 Select the (i, j ) unit, i.e., jth unit in ith row as the first unit.
 Then the rows to be selected
i, i + , i + 2,..., i + (m − 1)
and columns to be selected are
j , j + k , j + 2k ,..., j + (n − 1)k .
 The points at which the m selected rows and n selected columns intersect determine the position of mn
selected units in the sample.
Such a sample is called an aligned sample.
Alternative approach to select the sample is
 independently select n random integers i1 , i2 ,..., in such that each of them is less than or equal to .
 Independently select m random integers j1 , j2 ,..., jm such that each of them is less than or equal to k.
 The units selected in the sample will have following coordinates .
(i1 + r, jr +1 ),(i2 + r, jr +1 + k ),(i3 + r, jr +1 + 2k ),...,(in + r, jr +1 + (n − 1)k )

Such a sample is called an unaligned sample.
Under certain conditions, an unaligned sample is often superior to an aligned sample as well as a stratified random
4
sample.
Advantages of systematic sampling:
1. It is easier to draw a sample and often easier to execute it without mistakes. This is more advantageous when the
drawing is done in fields and offices as there may be substantial saving in time.
2. The cost is low and selection of units is simple. Much less training is needed for surveyors to collect units through
systematic sampling .
3. The systematic sample is spread more evenly over the population. So no large part will fail to be represented in
the sample. The sample is evenly spread and cross section is better. Systematic sampling fails in case of too many
blanks.
Relation to cluster sampling

The systematic sample can be viewed from the cluster sampling point of view. With n = nk, there are k possible
systematic samples. The same population can be viewed as if divided into k large sampling units, each of which
contains n of the original units. The operation of choosing a systematic sample is equivalent to choosing one of the
large sampling unit at random which constitutes the whole sample. A systematic sample is thus a simple random
sample of one cluster unit from a population of k cluster units.
5
Estimation of population mean : When N = nk:
Let
yij : observation on the unit bearing the serial number i + ( j − 1)k in population, i = 1, 2,..., k , j = 1, 2,..., n.
Suppose the drawn random number is i ≤ k .

th
Sample consists of i column (in earlier table).
Consider
1 n
ysy = yi = ∑ yij
n j =1
sample mean as an estimator of population mean given by
1 k n
Y = ∑∑ yij
nk i =1 j =1
1 k
= ∑ yi .
nk i =1
Probability of selecting ith column as systematic sample =
1 .
k
So
1 k
E ( y sy ) = ∑ yi = Y .
k i =1
Thus ysy is an unbiased estimator of Y .

6
Further,
1 k
Var ( ysy ) = ∑ ( yi − Y )2 .
k i =1
Consider
k n
( N − 1) S 2 = ∑∑ ( yij − Y ) 2
i =1 j =1
k n 2
= ∑∑ ( yij − yi ) + ( yi − Y ) 
i =1 j =1
k n k
= ∑∑ ( yij − yi ) 2 + n ∑ ( yi − Y ) 2
i =1 j =1 i =1
k
= k ( n − 1) S wsy
2
+ n ∑ ( yi − Y ) 2
where i =1
k n
1
2
S wsy = ∑∑
k (n − 1) i =1 j =1
( yij − yi ) 2
is the variation among units that lie within the same systematic sample . Thus
N − 1 2 k (n − 1) 2
Var ( ysy ) = S − S wsy
N N
N − 1 2 (n − 1) 2
= S − S wsy
N n
↓ ↓
Variation Pooled within
as a variation of the
whole k systematic sample
with N=nk. This expression indicates that when the within variation is large, then Var ( yi ) becomes smaller. Thus
higher heterogeneity makes the estimator more efficient and higher heterogeneity in well expected is systematic
7
sample.
Alternative form of variance
1 k
Var ( ysy ) = ∑ ( yi − Y )2
k i =1
2 2
1 k 1 n  1 k n 
= ∑  ∑ yij − Y  = ∑  ∑ ( yij − Y ) 
k i =1  n j =1  kn 2 i =1  j =1 
1 k  n n n 
=
kn 2
∑ ∑ ( y ij − Y )2 + ∑ ∑(y ij − Y )( yi − Y ) 
i =1  j =1 j ( ≠  ) =1  =1 
1  k n n 
= (nk − 1) S + ∑ ∑ ∑ ( yij − Y )( yi − Y )  .
2
kn 2  i =1 j ( ≠  ) =1  =1 
The intraclass correlation between the pairs of units that are in the same systematic sample is
E ( yij − Y )( yi − Y ) 1
ρw = ; − ≤ ρ ≤1
E ( yij − Y ) 2 nk − 1
k n n
1
∑ ∑ ∑
nk ( n − 1) i =1 j ( ≠ )=1 =1
( yij − Y )( yi − Y )
= .
 nk − 1  2
 S
 nk 
So substituting
k n n
∑ ∑ ∑(y
i =1 j ( ≠  ) =1  =1
ij − Y )( yi − Y ) = (n − 1)(nk − 1) ρ w S 2
nk − 1 S 2
in Var ( yi ) gives Var ( ysy ) = [1 + ρ w (n − 1)]
nk n
N −1 S 2
= [1 + ρ w (n − 1)].
N n
8
Comparison with SRSWOR:
For a SRSWOR sample of size n,

N −n 2
Var ( ySRS ) = S
Nn
nk − n 2
= S
Nn
k −1 2
= S .
N
Since N −1 2 n −1 2
Var ( ysy ) = S − S wsy
N n
N = nk
 k −1 N −1  2 n −1 2
Var ( ySRS ) − Var ( ysy ) =  − S + S wsy
 N N  n
n −1 2
= ( S wsy − S 2 ).
n
Thus ysy is
 more efficient than ySRS when 2

S wsy > S .2
 less efficient than ySRS when 2

S wsy < S. 2
 equally efficient as ySRS when

2
S wsy = S 2.
9
Also, the relative efficiency of ysy relative to ySRS is
Var ( ySRS )
RE =
Var ( ysy )
N −n 2
S
= Nn
N −1 2
S [1 + ρ w (n − 1)]
Nn
N −n  1 
=  
N − 1 1 + ρ w (n − 1) 
n(k − 1)  1  1
=  ; − ≤ ρ ≤ 1.
(nk − 1) 1 + ρ w (n − 1)  nk − 1
Thus ysy is
1
 more efficient than ySRS when ρw < − .
nk − 1
1
 less efficient than ySRS when ρ w > − .
nk − 1
1
 equally efficient as ySRS when ρ w = − . .
nk − 1
10
Sampling Theory
MODULE XI
LECTURE - 37
SYSTEMATIC SAMPLING
DR. SHALABH
1
Comparison with stratified sampling:
The systematic sample can also be viewed as if arising as a stratified sample. If population of N = nk units is
divided into n strata and suppose one unit is randomly drawn from each one of the strata then we get a
stratified sample of size n . In doing so, just consider each row of the following arrangement as a stratum.
Systematic 1 2 3 ... i ... k

Sample
Number
Sample 1 y1 y2 y3 ... yi ... yk

composition
2 yk +1 yk + 2 yk + 3 ... yk + i ... y2k
. . . . . . . .
. . . . . . . .
. . . . . . . .
n y( n −1) k +1 y( n −1) k + 2 y( n −1) k +3 ... y( n −1) k +i ... ynk
Probability 1 1 1 1 1
... ...
k k k k k
Sample mean
y1 y2 y3 ... yi ... yk
2
Recall that in case of stratified sampling with k strata, the stratum mean
k
1
yst =
N
∑N
j =1
j yj
is an unbiased estimator of population mean.
Considering the set up of stratified sample in the set up of systematic sample, we have
 Number of strata = n
 Size of strata = k (row size)
 Sample size to be drawn from each stratum = 1
and yst becomes

1 n
yst = ∑ ky j
nk j =1
1 n
= ∑ yj
n j =1
n
1
Var ( yst ) =
n2
∑Var ( y )
j =1
j
1 n
k −1 2  N −n 2
=
n2
∑
j =1 k .1
S j  using Var ( ySRS ) =
 Nn
S 

k −1 n 2
= ∑Sj
kn 2 j =1
k −1 2
= S wst
nk
N −n 2
= S wst
3
Nn
where
1 k
S 2j = ∑ ( yij − y j )2
k − 1 i =1
is the mean sum of squares of units in jth stratum.
1 n 2
2
S wst = ∑Sj
n j =1
k n
1
= ∑∑ ( yij − y j )2
n(k − 1) i =1 j =1
is the mean sum of squares within strata (or rows).

The variance of systematic sample mean is
1 k
Var ( ysy ) = ∑ ( yi − Y )2
k i =1
2
1 k 1 n 1 n 
= ∑  ∑ yij − ∑ y j 
k i =1  n j =1 n j =1 
2
1 k  n 
2 ∑ ∑
= ( yij − y j ) 
n k i =1  j =1 
1  k n k n 
= 2  ∑∑
n k  i =1 j =1
( yij − y j ) 2
+ ∑ ∑ ( yij − y j )( yi − y  .
i =1 j ≠  =1 
4
Now we simplify and express this expression in terms of intraclass correlation coefficient. The intraclass correlation
coefficient between pairs of deviations of units which lie along the same row measured from their stratum means is
defined as k n
1
∑ ∑ ( yij − y j )( yi − y )
nk (n − 1) i =1 j ≠  =1
ρ wst =
1 k n
∑∑
nk i =1 j =1
( yij − y j ) 2
k n
∑∑
i =1 j ( ≠  ) =1
( yij − y j )( yi − y )
=
( N − n)(n − 1) S wst
2
so 1
Var ( ysy ) = ( N − n) S wst
2 
2
+ ( N − n)(n − 1) ρ wst S wst
2

nk
N −n 2
= S wst [1 + (n − 1) ρ wst ] (using N = nk ).
Nn
Thus
N −n
Var ( yst ) − Var ( ysy ) = − (n − 1) ρ wst S wst
2
Nn
5
and the relative efficiency of systematic sampling relative to equivalent stratified sampling is given by
1
RE = .
1 + (n − 1) ρ wst
So the systematic sampling is
 more efficient than the corresponding equivalent stratified sample when ρ wst > 0
 less efficient than the corresponding equivalent stratified sample when ρ wst < 0
 equally efficient than the corresponding equivalent stratified sample when ρ wst = 0.
Comparison of systematic sampling, stratified sampling and SRS with population with linear
trend:
We assume that the values of units in the population increase according to linear trend.
So the values of successive units in the population increase in accordance with a linear model so that
yi = a + bi, i = 1, 2,..., N .
Now we determine the variances of ySRS , ysy and yst under this linear trend.
6
Under SRSWOR
N −n 2
V ( ySRS ) = S
Nn
Here N = nk
N
1
Y = a+b
N
∑i
i =1
1 N ( N + 1) N +1
= a+b =a +b
N 2 2
1 N
S2 = ∑
N − 1 i =1
( yi − Y ) 2
2
1 N  N + 1
= ∑ 
N − 1 i =1 
a + bi − a − b
2 
2
b2 N  N + 1 
= ∑  i − 2 
N − 1 i =1 
b2  N 2  N +1  
2
= ∑ i − N   
N − 1  i =1  2  
b 2  N ( N + 1)(2 N + 1) N ( N + 1) 2 
= −
N − 1  6 4 

N ( N + 1)
= b2
12
nk − n 2 nk (nk + 1)
Var ( ySRS ) = b
nk .n 12
7
2
b
= (k − 1)(nk + 1).
12
Under systematic sampling
Earlier yij denoted the value of study variable with the jth unit in the ith systematic sample. Now yij represents the
value of [i + ( j - 1) k ]th unit of the population, so
yij = a + b [i + ( j − 1)k ] , i = 1, 2,..., k ; j = 1, 2,..., n.
1 k
ysy = ∑ yi
k i =1
1 k
Var ( y sy ) = ∑ ( yi − Y )2
k i =1
1 n 1 n
yi = ∑ yij
n j =1
= ∑ a + b {i + ( j − 1)k}
n j =1 
 n −1 
= a + bi + k
 2 
2
k

k
 n −1  nk + 1 
∑
i =1
( yi − Y ) = ∑  a + b  i +
2
i =1  
k −a −b
2  2 
k
 k +1 
2
 k  k +1
2
k +1 k 
= b2 ∑  i −  = b2 ∑ i 2 +   − 2 ∑ i
i =1  2   i =1  2  2 i =1 
 k (k + 1)(2k + 1)  k + 1  2 k (k + 1) 
= b2  +  − (k + 1) 
 6  2  2 
b2
= k (k 2 − 1)
12
1 b2 b2 2
Var ( ysy ) = k (k 2 − 1) = (k − 1).
8
k 12 12
Under stratified sampling
yij = a + b [i + ( j − 1)k ] , i = 1, 2,..., k , j = 1, 2,..., n

k
1
yst =
N
∑N y
i =1
i i
N −n 2
Var ( y st ) = S wst
Nn
k −1 2
= S wst
nk
1 n 2
2
where S wst = ∑Sj
n j =1
k n
1
= ∑∑
n(k − 1) i =1 j =1
( yij − y j ) 2
2
1 k n
  k +1 
= ∑∑ 
n(k − 1) i =1 j =1 
a + b {i + ( j − 1)k} − a − b 
 2
+ ( j − 1)k 

2
b2 k n
 k +1 
= ∑∑ i −
n(k − 1) i =1 j =1 

2 
b 2 nk (k 2 − 1)
=
n(k − 1) 12
k (k + 1)
= b2
12
k − 1 2 k (k + 1)
Var ( yst ) = b
nk 12
b  k −1 
2 2
=  .
9
12  n 
1
If k is large, so that is negligible, then comparing Var ( yst ), Var ( ysy ) and V ( ySRS ),
k
Var ( yst ) : Var ( ysy ) : Var ( ySRS )
or : :
k 2 −1 k 2 −1 (k − 1)(1 + nk )
n
or k +1 : k +1 : nk + 1
n
or k +1 k +1 nk + 1
: :
n(k + 1) k +1 k +1
1 1 : n
 n
Thus
1
Var ( yst ) : Var ( ysy ) : Var ( ySRS ) :: : 1 : n
n
So stratified sampling is best for linearly trended population. Next best is systematic sampling. 10
Sampling Theory
MODULE XI
LECTURE - 38
SYSTEMATIC SAMPLING
DR. SHALABH
1
Estimation of variance:
As such there is only one cluster, so variance in principle, cannot be estimated.
Some approximations have been suggested.
1. Treat systematic sample as if it were a random sample.
 ( y ) =  1 − 1  s2
Var sy   wc
 n nk 
1 n −1
2
where swc = ∑ ( yi + jk − yi )2 .
n − 1 j =0
This estimator under-estimates the true variance.
2. Use of successive differences of the values gives

n −1 2
(y ) =  1 − 1  1 ( yi+ jk − yi+( j +1) k ) .
Var sy   ∑
 n nk  2(n − 1) j =0
This estimator is a biased estimator of true variance.
3. Use the balanced difference of y1 , y2 ,..., yn to get

n−2 2
(y ) =  1 − 1  1  yi yi + 2 
Var sy 
 n nk

 5( n − 2)
∑i 
 2 − yi +1 + 2 

or
n−4 2
(y ) =  1 − 1  1  yi yi + 4 
Var sy   ∑  − yi +1 + yi + 2 − yi +3 + 2  .
 n nk  15(n − 4) i  2 
2
n
4. The interpenetrating subsamples can be utilized by dividing the sample into C groups each of size .Then the
c
group means are y1 , y2 ,..., yc . Now find
1 c
y= ∑ yt
c t =1
c
1
(y ) =
Var sy ∑
c(c − 1) t =1
( yt − y ) 2 .
Systematic sampling when N ≠ nk .
When N is not expressible as nk then suppose N can be expressed as
N = nk + p; p<k.
Then consider the following sample mean as an estimator of population mean

 1 n +1
 n + 1 ∑ yij if i ≤ p
ysy = yi =  n
j =1
1
 ∑ yij if i > p.
 n j =1
In this case
1  p  1 n +1  n 
1 n 
E ( yi ) = ∑  ∑ yij  + ∑  ∑ yij  
k  i =1  n + 1 j =1  i = p +1  n j =1  
≠ Y.
So ysy is a biased estimator of Y .

3
An unbiased estimator of Y is
k
ysy* =
N
∑yj
ij
k
= Ci
N
where Ci = nyi is the total of values of the ith column.

k
E ( ysy* ) = E (Ci )
N
k 1 k
= . ∑ Ci
N k i =1
=Y
k 2  k − 1  *2
Var ( ysy* ) =   Sc
N2  k 
where
2
1 k  NY 
S =
*2
c ∑ 
k − 1 i =1 
nyi −
k 
.
4
Now we consider another procedure which is opted when population size N is not expressible as the product of n and k
When population size N is not expressible as the product of n and k, then let
N = nq+ r.
Then take the sampling interval as
 n
q if r ≤
k = 2.
q + 1 n
if r >
 2
M  M
Let g denotes the largest integer contained in .
  g
If k = q* (= q or q + 1), then the

 N  N  N
  *  with probability  *  + 1 −  * 
  q q
  q 
number of units expected in sample = 
  N  + 1 with probability  N  −  N  .
  q*   *   *
  q  q 
5
If q = q*, then we get
 r  r r 
n +   with probability   + 1 −  
 q q q
n* = 
n +  r  + 1 with probability  r  −  r 
  q     
  q  q
Similarly, if q* = q + 1, then
 n−r   (n − r )  n−r 
n −   with probability   +1−  
  q +1   (q + 1)   q +1 
n* = 
 n +  n − r  + 1 with probability  n − r  −  (n − r )  .
  q + 1     
 q + 1   (q + 1) 

  
6
n
Example: Let N =17 and n = 5. Then q = 3 and r = 2. Since r < , k = q = 3.
2
Then sample sizes would be
 r  r  r 1
n +   = 5 with probability   + 1 −   =
   q q
  q 3
n* = 
n +  r  + 1 = 6 with probability  r  −  r  = 2 .
  q     
  q  q 3
This can be verified from the following example:
Systematic sample Systematic sample Probability

number
1 Y1 , Y4 , Y7 , Y10 , Y13 , Y16 1/3

2 Y4 , Y5 , Y8 , Y11 , Y14 , Y17 1/3
3 Y3 , Y6 , Y9 , Y12 , Y15 1/3
We now prove the following theorem which shows how to obtain an unbiased estimator of the population mean
when N ≠ nk .
7
Theorem: In systematic sampling with sampling interval k from a population with size N ≠ nk , an unbiased estimator
of the population mean Y is given by
k  n' 
Yˆ =  ∑ y 
N i
where i stands for the ith systematic sample, i = 1, 2,..., k and n' denotes the size of ith systematic sample.
1
Proof. Each systematic sample has probability . Hence
k
k
1 k  n' 
E (Yˆ ) = ∑ .  ∑ y 
i =1 k N  i
1 k
 n'

=
N
∑  ∑ y  .
i =1 i
Now, each unit occurs in only one of the k possible systematic samples. Hence
k
 n'
 N
∑  ∑ y  = ∑ Y ,
i =1 i =1
i
i
which on substitution in E (Yˆ ) proves the theorem.
When N ≠ nk , the systematic samples are not of the same size and the sample mean is not an unbiased estimator of
the population mean. To overcome these disadvantages of systematic sampling when N ≠ nk , circular systematic
sampling is proposed. Circular systematic sampling consists of selecting a random number from 1 to N and selecting ,
the unit corresponding to this random number and thereafter every kth unit in a cyclical manner till a sample of n units
N
is obtained, k being the nearest integer to . In other words, if i is a number selected at random from 1 to N, then the
n
circular systematic sample consists of units with serial numbers
i + jk , if i = jk ≤ N 
 j = 0,1, 2,...,( n − 1).
i + jk − N , if i = jk > N 
8
This sampling scheme ensures equal probability of inclusion in the sample for every unit.
Example:
14
Let N = 14 and n = 5. Then, k = nearest integer to = 3. Let the first number selected at random from 1 to 14 be
5
7. Then, the circular systematic sample consists of units with serial numbers
7, 10, 13, 16 - 14=2, 19 –14 = 5.
This procedure is illustrated diagrammatically in following figure.
9
Theorem
In circular systematic sampling, the sample mean is an unbiased estimator of the population mean.
Proof: If i is the number selected at random, then the circular systematic sample mean is
1 n 
y=
n 
∑ y  ,
i
 n

where 

∑ y  denotes the total of y values in the ith circular systematic sample, i =1,2,…N. We note here that in
circular systematic sampling, there are N circular systematic samples, each having probability 1 of its selection.
i
N
Hence,
N
1 n  1 1 N  n 
E( y ) = ∑  ∑ y  × = ∑  ∑ y  .
i =1 n  i N Nn i =1  i
Clearly, each unit of the population occurs in n of the N possible circular systematic sample means. Hence,
N
 n
 N
∑  ∑ y 
i =1
= n∑ Yi ,
i =1
i
which on substitution in E ( y ) proves the theorem.
10
What to do when N ≠ nk
One of the following possible procedures may be adopted when N ≠ nk
i. Drop one unit at random if sample has (n+1) units.
ii. Eliminate some units so that N = nk.
iii. Adopt circular systematic sampling scheme.
iv. Round off the fractional interval k.
11
Sampling Theory
MODULE XII
LECTURE - 39
SAMPLING ON SUCCESSIVE
OCCASIONS
DR. SHALABH
1
Many times, we are interested in measuring a characteristic of a population on several occasions to estimate the trend
in time of population means as a time series of the current value of population mean or the value of population mean
over several points of time.
When the same population is sampled repeatedly, the opportunities for a flexible sampling scheme are greatly
enhanced. For example, on the hth occasion we may have a part of sample that are matched with (common to) the
sample at (h -1) th occasion, parts matching with both (h - 1) th and (h - 2) th occasions, etc.
Such a partial matching is termed as sampling on successive occasions with partial replacement of units or
rotation sampling or sampling for a time series.
Notations:
Let P be the fixed population with N units.
yt : value of certain dynamic character which changes with time and can be measured for each unit on a number of
occasions, t = 1,2,…n.
yij : value of y on jth unit in the population at the ith occasion, i = 1,2…,h, j = 1,…,N.
1
Yi =
N
∑y
j
ij : population mean for the ith occasion.
1 N
Si2 = ∑ ( yij − Yi )2 : population variance for the ith occasion.
N − 1 j =1
2
S12 = S 22 = ... = S 2
1 N
ρ =
ii* ∑ ( yij − Yi )( yi* j − Yi* )
N − 1 j =1
is the population correlation coefficient between observations occasion i and i* (i < i* = 1, 2,..., h)
ρ = ρ12
si* : sample of size ni selected at the ith occasion

*
*
sim : part of si* which is common to (i.e. matched with) si −1
*
sim = si*  si*−1 , i = 2,3,..., h ( s1m = s2 m ).
Note that s1m and s2 m are of sizes n1* and n2* respectively.
*
siu* : set of units in si* not obtained by the selection in sim .
Often siu* = si*−c1  si (i = 2,..., h) ( s1*u = P − s1*m ).

Note that siu* is of size ni** ( = ni − ni* ).
yi = sample mean of units in ith occasion.

*
yi* = sample mean of the units in sim on the ith occasion.
yi** = sample mean of units in siu on the ith occasion.
yi*** = sample mean of units in sim on the (i - 1)th occasion, i = 2,3,…,h .
(y ***
= y1* , yi*** depends on yi −1 and yi**−1 )
3
2
Sampling on two occasions
Assume that ni = n
ni* = m
ni** = u (= n − m), i = 1, 2.
Suppose that the sample s1* is an SRSWOR from P. The sample

s2* = s2*m  s2*u
where s2*m is an SRSWOR sample of size m from s1* and
s2*u is an SRSWOR sample of size u from ( P − s1* ) .
Two types of estimators are available for the estimation of population mean:
* *
1. Type 1 estimators: They are obtained by taking a linear combination of estimators obtained from s2u and s2 m .
2. Type 2 estimators: They are obtained by considering the best linear combination of sample means.
4
Type 1 Estimators:
Two estimators are available for estimating Y2
i. t2u = y2**
S 22 1
with Var ( y2** ) = = (say)
u Wu
ii. t2m = linear regression estimate of Y2 based on the regression of y2j on y1j
y2* + b( y1 − y1* )
∑ (y 1j − y1* )( y2 j − y2* )
i∈s*2m
where b = is the sample regression coefficient.
∑ (y 1j − y1* ) 2
j∈s1*m
Recall in case of double sampling, we had

1 1  1 1 
Var (Yˆregd ) = S y2  −  − ρ 2 S y2  − * 
n N  n n 
2
S 1 1 
= y − ρ 2 S y2  − * 
n n n 
(1 − ρ ) 2 ρ S y
2 2 2
1
= Sy + * ( ignoring term of order ).
n n N
5
So in this case
S 22 (1 − ρ 2 ) ρ 2 S 22
Var ( t2 m ) = +
m n
1
= (say).
Wm
If there are two uncorrelated unbiased estimators of a parameter, then the best linear unbiased estimator of parameter
can be obtained by combining them using a linear combination with suitably chosen weights. Now we discuss how to
choose weights in such a linear combination of estimators.
Let θˆ1 and θˆ2 be two uncorrelated and unbiased estimators of θ , i.e., E (θˆ1 ) = E (θˆ2 ) = θ and Var (θˆ1 ) = σ 12 ,
Var (θˆ2 ) = σ 22 , Cov(θˆ1 , θˆ2 ) = 0.
Consider θˆ = ωθˆ1 + (1 − ω )θˆ2 where 0 ≤ ω ≤ 1 is the weight. Now choose ω such that Var (θˆ) is minimum.
Var (θˆ) = ω 2σ 12 + (1 − ω ) 2 σ 22
∂Var (θˆ)
=0
∂ω
⇒ 2ωσ 12 − 2(1 − ω )σ 22 = 0
σ 22
⇒ω = = ω * , say
σ 12 + σ 22
∂ 2Var (θˆ)
> 0.
∂ω 2 ω =ω*
6
The minimum variance achieved by θˆ is
Var (θˆ) Min = ω * σ 12 + (1 − ω * ) 2 σ 22

2
σ 24σ 12 σ 14σ 22
= +
(σ + σ 22 ) (σ + σ 22 )
2 2 2 2
1 1
σ 22σ 12 1
= = .
σ1 + σ
2 1
+
1
σ 22 σ 12
Now we implement this result in our case.
Consider the linear combination of t2u and t2m as

Yˆ2 = ωt2u + (1 − ω )t2 m
where the weights ω are obtained as
Wu
ω=
Wu + Wm
so that Yˆ2 is the best combined estimate.
The minimum variance with this choice of ω is

1 S (n − u ρ ) 2 2
Var (Yˆ2 ) = = 2
.
Wu + Wm (n 2 − u 2 ρ 2 )
S2
For u = 0 (complete matching), Var (Yˆ2 ) = 2 .
n
2
S
For u = n (no matching), Var (Yˆ2 ) = 2 .
n
7
Type II estimators:
We now consider the minimum variance linear unbiased estimator of Y2 under the same sampling scheme as under
Type I estimator.
A best linear (linear in terms of observed means) unbiased estimator of Yˆ2 is of the form
Yˆ2* = ay1** + by1* + cy2* + dy2**

 m n −1
where constants a,b,c,d and matching fraction λ= =  are to be suitably chosen so as to minimize the
 n n 
variance.
Assume S12 = S 22 .
Now E (Yˆ2* ) = (a + b)Y1 + (c + d )Y2.
If Yˆ2* has to be an unbiased estimator of Y2 , i.e.

,
E (Yˆ2* ) = Y2 ,
it requires
a+b = 0
c + d = 1.
Since a minimum variance unbiased estimator would be uncorrelated with any unbiased estimator of zero, we must
have
Cov (Yˆ2* , y1** − y1* ) = 0 (1)
Cov (Yˆ2* , y2* − y2** ) = 0 . (2)

8
Since
Cov ( y2* , y1** ) = 0 = Cov ( y2* , y2** )
ρS2
Cov ( y2* , y1* ) =
m
Cov ( y2** , y1** ) = Cov ( y2** , y2* ) = 0
S2
Var ( y2* ) =
m
S2
Var ( y2** ) = .
u
1
Now solving (1) and (2) by neglecting terms of order , we have
N
Cov(Yˆ2* , y1** − y1* ) = Cov(ay1** + by1* + cy2* + dy2** , y1** − y1* )

= aVar ( y1** ) + bC ov( y1* , y1** ) + cCov( y2* , y1** ) + dC ov( y2** , y1** )
− aC ov( y1** , y1* ) − bVar ( y1* ) − cC ov( y1* , y2* ) − dCov( y2** , y1* )
or
ρS2 cS 2 (1 − c) S 2
−a + = . (3)
m m u
Similarly, from (2), we have
Cov( y2* , y2* − y2** ) = 0

aS 2 c ρ S 2 aS 2
⇒− + = . (4)
m m u
9
Solving (3) and (4) gives
λµρ λ
a= , c=
1− ρ µ
2 2
1 − ρ 2µ 2
u n−u
where µ = = 1− λ, λ =
n n
b = − a, d = 1 − c.
Substituting a, b, c, d, the best linear unbiased estimator of Y2 is
 µ (1 − ρ 2 µ ) y2** 
Yˆ2* = λµρ ( y1** − y1* ) + λ y2* + .
 (1 − ρ 2 µ 2 ) 
For these values of a and c,
 1 − ρ 2µ S 2 
Var (Yˆ2* ) =  2 2 
 1− ρ µ n 
Alternatively, minimize Var (Yˆ2* ) with respect to a and c and find optimum values of a and c. Then find the estimator
and its variance.
Till now, we used SRSWOR for the two occasions. We now consider unequal probability sampling schemes on two
occasions for estimating . We use the same notations as defined in varying probability scheme.
10
Sampling Theory
MODULE XII
LECTURE - 40
SAMPLING ON SUCCESSIVE
OCCASIONS
DR. SHALABH
1
Des Raj scheme:
Let s1* be the sample selected by PPSWR from P using x as a size (auxiliary) variable.
xi
Then pi = is the size measure of i, where X tot is the population total of auxiliary variable.
X tot
s2* = s2*m  s2*u

*
where s2m is an SRSWR(m) from s1* and *
s2u is an independent sample selected from P by PPSWR using u draws
(m + u = n).
The estimator is
Yˆ2 des = ω t2 m + (1 − ω )t2u ; 0 ≤ ω ≤ 1

where
 y2 j − y1 j   y1 j 
t2 m = ∑   + ∑ 
mp j  j∈s1*  np j

j∈s2* m  
 y2 j 
t2u = ∑   .
j∈s2* u  up j 
Assuming
2 2
N Y  N Y 
∑ Pj  1 j − Y1  = ∑ Pj  2 j − Y2  = V0 (say).
P  P 
j =1  j  j =1  j 
2
For the optimum sampling fraction
m
λ= ,
n
V0 (1 + 2(1 − δ )
Var (Y2 des ) =
2n
where
N
 Y1i   Y2i 
∑ P  P −Y  P
i 1 − Y2 
δ=
i =1  i  i 
σ pps ( y1 )σ pps ( y2 )
2
ZN

Varpps ( z ) = ∑ Pi  i − Z  = σ pps
2
( z)
i =1 P
 i 
N
Z = ∑ Zi .
i =1
3
(ii) Chaudhuri - Arnab sampling scheme
Let s1* be a sample selected by Midzuno’s sampling scheme,
s2* = s2*m  s2*u
*
where s2m = SRSWOR sample from s1*
*
s2u = sample of size u from P by Midzuno’s sampling scheme.
Then an estimator of Y is
Yˆ2 ca = α t2 m + (1 − α )t2u ; 0 ≤α ≤1
where
 ( y2 j − y1 )n   y1 j 
t2 m = ∑  mπ j
 + ∑  
j∈s2* m   j∈s1*  π j 
y2 j
t2 u = ∑π
j∈s2 u
*
j
π j = np j
π *j = up j .
Similarly other schemes are also there. 4

Sampling on more than two occasions
When there are more than two occasions, one has a large flexibility in using both sampling procedures and estimating
the character.
Thus on occasion i
 one may have parts of the sample that are matched with occasion (i - 1)
 parts that are matched with occasion (i - 2)j
 and so on.
One may consider a single multiple regression of all previous matchings on the current occasion.
However, it has been seen that the loss of efficiency incurred by using the information from the latest two or three
occasions only is fairly small in many occasions.
Consider the simple sampling design where
si* = sim
*
 siu* ,
where
*
sim is a sample by SRSWOR of size mi from s(*i−1) ,
siu* is a sample by SRSWOR of size ui (= n - mi) from the units not already sampled.
Assume ni = n, S 22 = S 2 for all i. 5

On the ith occasion, we have therefore two estimators
S2 1
tiu = yi** with Var (tiu ) = =
ui Wiu
tim = yi* + b(i −1)i (Yˆ(i −1) − yi*** )

∑(y ( i −1) j − yi*** )( yij − yi* )
si**
where b( i −1),i is the regression of yij on y(i-1)j given by b(i−1),i =
∑(y ( i −1) j − yi*** ) 2
si**
S 2 (1 − ρ 2 ) 1
Var (tim ) = + ρ 2Var (Yˆ(i −1) ) =
mi Wim
1
assuming that ρ ( i −1),i = ρ , i = 2,3,...,. and terms of order are negligible.
N
The expression of Var(tim) has been obtained from the variance of regression estimator under double sampling
1 1  1 1 
V ( yˆ regd ) = Su2  −  − ρ 2 S y2  − * 
 n N  n n 
S y2 (1 − ρ 2 ) ρ 2 S y2
= −
n n*
1
which is obtained after ignoring the terms of N by using mi for n and replacing
ρ 2 S y2 ˆ
( = β V ( x *) by ρ Var (Y( i −1) ) since β = ρ and Si2 is constant. Using weights as the inverse of variance,
2 2
*
n
the best weighted estimator from tiu and tim is
Yîu = ωi + tiu + (1 − ωi ) + tim
6
where Wiu
ωi = .
Wiu + Wim
Then
1 gS2
Var ( yî ) = = i (say), i = 1, 2,..., ( g1 = 1).
Wiu + Wim n
Substituting
1 S2
=
Wiu ui
1 g S2
in = i ,
Wiu + Wim n
we have
n 1
= ui + .
gi 1 − ρ 2
ρ 2 gi −1
+
mi n
n n
Now maximize with respect to mi so as to minimize Var ( yî ). So differentiate with respect to mi and
gi gi
substituting it to be zero, we get
2
(1 − ρ 2 )  1 − ρ 2 ρ 2 gi −1 
= + 
mi2  mi n 
n 1− ρ 2
⇒ mˆ i = .
gi −1 (1 + 1 − ρ 2 )
7
Now the optimum sampling fracture
mˆ i can be determined successively for i=2,3,… for given values of ρ .
n n
Substitute this in the expression of , we have
gi
1 (1 − 1 − ρ 2 ) 2
= 1+
gi gi−1ρ 2
or
qi = 1 + aqi −1
where
1 (1- 1- ρ 2 )
qi = , q1 = 1, a = ; 0 < a < 1.
gi (1 + 1 − ρ 2 )
Repeated use of this relation gives
qi −1 = 1 + aqi −2
⇒ qi = 1 + a (1 + aqi −1 )
= 1 + a + a 2 qi −1
qi −2 = 1 + aqi −3
⇒ qi = 1 + a + a 2 (1 + aqi −2 )

(1 − a i ) 1
= = as i → ∞.
(1 − a ) 1 − a
8
For sampling an infinite number of times, the limiting variance factor g ∞ is
2 1− ρ 2
g∞ = 1 − a = .
1+ 1− ρ 2
The limiting value of V (Yî ) as i → ∞ is
2S 2 1 − ρ 2
lim Var (Yî ) = Var (Yˆ∞ ) = .
i →∞
(
n 1+ 1− ρ 2 )
The limiting value of optimum sampling fraction as i → ∞ is
mˆ i mˆ ∞ 1− ρ 2 1
lim = = = .
i →∞ n n
(
g∞ 1 + 1 − ρ 2 2
)
Thus for the estimation of current population mean by this procedure, one would not have to match more than 50% of
the sample drawn on the last occasion.
Unless ρ is very high, say more than 0.8, the reduction in variance (1 − g h ) is only modest.
9
Type II estimation
Consider
Yî = aiYî −1 + bi yi**−1 + ci yi*** + di yi** + ei yi* .

Now
E (Yî ) = (ai + bi + ci )Y(i −1) + (di + ei )Yi .
So for unbiasedness,
ci = −(ai + bi )
di = 1 − ei. .
An unbiased estimator is of the form
Yî = ai Yˆ(i −1) + bi yi** − (ai + bi ) yi*** + di yi** + (1 − di ) yi*
To find optimum weights, minimize

Var (Yˆ ) with respect to
i ai , bi , di .
Alternatively, one can consider that
Yî = aiYî −1 + bi yi** − (ai + bi ) yi*** + di yi** + (1 − di ) yi*
must be uncorrelated with all unbiased estimators of zero. Thus
Cov(Yî , yi**−1 − yi*** ) = 0
Cov(Yî −1 , yi**−1 − yi*** ) = 0
Cov(Yî , yi**− 2 − yi***

−1 ) = 0.
10
Using these restrictions, find the constants and get the estimator.
Sampling Theory
MODULE XIII
LECTURE - 41
NON SAMPLING ERRORS
DR. SHALABH
1
It is a general assumption in sampling theory that the true value of each unit in the population can be obtained
and tabulated without any errors. In practice, this assumption may be violated due to several reasons and
practical constraints. This results in errors in observations as well as in tabulation. Such errors which are due to
factors other than sampling are called non-sampling errors.
The non-sampling errors are unavoidable in census and surveys. The data collected by complete enumeration
in census is free from sampling error but would not remain free from non-sampling errors. The data collected
through sample surveys can have both – sampling errors as well as non-sampling errors. Non-sampling errors
arise because of the factors other than the inductive process of inferring about the population from a sample.
In general, the sampling errors decrease as the sample size increases whereas non-sampling error increases as
the sample size increases.
In some situations, the non-sampling errors may be large and deserve greater attention than the sampling error.
In any survey, it is assumed that the value of the characteristic to be measured has been defined precisely for
every population unit. Such a value exists and is unique. This is called the true value of the characteristic for
the population value. In practical applications, data collected on the selected units are called survey values
.
and differ from the true values. Such difference between the true and observed values is termed as
observational error or response error. Such an error arises mainly from the lack of precision in measurement
techniques and variability in the performance of the investigators.
2
Sources of non-sampling errors:
Non sampling errors can occur at every stage of planning and execution of survey or census. It occurs at planning stage,
field work stage as well as at tabulation and computation stage. The main sources of nonsampling errors are
 lack of proper specification of the domain of study and scope of investigation,
 incomplete coverage of the population or sample,
 faulty definition,
 defective methods of data collection and
 tabulation errors.
More specifically, one or more of the following reasons may give rise to nonsampling errors or indicate its presence:
 The data specification may be inadequate and inconsistent with the objectives of the survey or census.
 Due to imprecise definition of the boundaries of area units, incomplete or wrong identification of units, faulty
methods of enumeration etc, the data may be duplicated or may be omitted.
 The methods of interview and observation collection may be inaccurate or inappropriate.
 The questionnaire, definitions and instructions may be ambiguous.
 The investigators may be inexperienced or not trained properly.
 The recall errors may pose difficulty in reporting the true data.
 The scrutiny of data is not adequate.
 The coding, tabulation etc. of the data may be erroneous.
 There can be errors in presenting and printing the tabulated results, graphs etc.
 In a sample survey, the non-sampling errors arise due to defective frames and faulty selection of sampling units.
3
These sources are not exhaustive but surely indicate the possible source of errors.
Non-sampling errors may be broadly classified into three categories.
(a) Specification Errors: These errors occur at planning stage due to various reasons, e.g., inadequate and
inconsistent specification of data with respect to the objectives of surveys/census, omission or duplication of units due
to imprecise definitions, faulty method of enumeration/interview/ambiguous schedules etc.
(b) Ascertainment Errors: These errors occur at field stage due to various reasons e.g., lack of trained and
experienced investigations, recall errors and other type of errors in data collection, lack of adequate inspection and
lack of supervision of primary staff etc.
(c) Tabulation Errors: These errors occur at tabulation stage due to various reasons, e.g., inadequate scrutiny of
data, errors in processing the data, errors in publishing the tabulated results, graphs etc.
Ascertainment errors may be further sub-divided into
i. Coverage errors owing to over-enumeration or under-enumeration of the population or sample, resulting from
duplication or omission of units and from non-response.
ii. Content errors relating to wrong entries due to errors on the part of investigators and respondents.
Same division can be made in the case of tabulation error also. There is a possibility of missing data or repetition of
data at tabulation stage which gives rise to coverage errors and also of errors in coding, calculations etc. which gives
rise to content errors.
4
Treatment of non-sampling errors:
Some conceptual background is needed for the mathematical treatment of non-sampling errors.
Total error:
Difference between the sample survey estimate and the parametric true value being estimated is termed as total error.
Sampling error:
If complete accuracy can be ensured in the procedures such as determination, identification and observation of
sample units and the tabulation of collected data, then the total error would consist only of the error due to sampling,
termed as sampling error.
Measure of sampling error is mean squared error (MSE). The MSE is the difference between the estimator and the
true value and has two components:
• square of sampling bias.
• sampling variance.
If the results are also subject to non-sampling errors, then the total error would have both sampling and non-sampling
error.
Total bias:
The difference between the expected value and the true value of the estimator is termed as total bias. This consists of
sampling bias and nonsampling bias.
5
Non-sampling bias:
For the sake of simplicity, assume that the two following steps are involved in randomization:
i. for selecting the sample of units and
ii. for selecting the survey personnel.
Let Yˆ be the estimate of population mean Y based on s th sample of units supplied by the r th sample of the survey
sr
personnel. The conditional expected value of Yˆ taken over the second step of randomization for a fixed sample of
sr
units is
Er (Yˆsr ) = Yˆso ,
ˆ
which may be different from Ys based on true values of the units in the sample.
The expected value of Yˆso over the first step of randomization gives
Es (Yˆso ) = Y * ,
which is the value for which an unbiased estimator can be had by the specified survey process. The value Y * may be
different from true population mean Y and the total bias is given as
Biast (Yˆsr ) = Y * − Y .
The sampling bias is given by
Bias (Yˆ ) = Es (Yˆs ) − Y .

6
The non-sampling bias is
Biasr (Yˆsr ) = Biast (Yˆsr ) − Biass (Yˆs )

= Y * − E (Yˆ )
s s
= Es (Yˆso − Yˆs )
which is the expected value of the non-sampling deviation.
In case of complete enumeration, there is no sampling bias and total bias consists only of non-sampling bias.
In case of sample surveys, the total bias consists only of non-sampling bias.
The non-sampling bias in a census can be estimated by surveying a sample of units in the population using better
techniques of data collection and compilation than those adopted under general census condition. Surveys called
post-enumeration surveys, which are usually conducted just after the census for studying the quality of census data,
may be used for this purpose.
In a large scale sample survey, the ascertainment bias can be estimated by resurveying a sub-sample of the original
sample using better survey techniques.
Another method of checking survey data is to compare the values of the units obtained in two surveys and to reconcile
discrepant figures by further investigation. This method of checking is termed reconciliation (check ) surveys.
7
Non-sampling variance:
The MSE of Yˆsr based on s th sample of units and supplied by r th sample of the survey personnel is
MSE (Yˆsr ) = Esr (Yˆsr − Y ) 2
where Y is the true value being estimated. This takes into account both sampling and non-sampling errors, i.e.,
2
MSE (Yˆsr ) = Var (Yˆsr ) +  Bias (Yˆsr ) 
 
= E (Yˆ − Y * ) 2 + (Y * − Y ) 2
sr
where Y * is the expected value of the estimator taken over both steps of randomization.
Taking the variance over the two steps of randomization, we get
Varsr (Yˆsr ) = Vars  Er (Yˆsr )  + Es Varr (Yˆsr ) 

   
= Vars Yˆso  + Es  Er (Yˆsr − Yˆso ) 2 
   
↓ ↓
sampling non-samping
variance variance
8
Note that
Yˆsr − Yˆso = (Yˆsr − Yˆso − Yôr + Y * ) + (Yôr − Yˆ * )
where
Yôr = Es (Yˆsr ).
E (Yˆsr − Yˆso ) 2 = Esr (Yˆsr − Yˆso − Yôr + Yˆ * ) 2 + Er (Yôr − Yˆ * ) 2

↓ ↓
Interaction between Variance
sampling and between
non-sampling errors survey personnel
The MSE of an estimator consists of

• sampling variance,
• interaction between sampling and non-sampling errors,
• variance between survey personnel and
• square of the sum of sampling and non-sampling biases.
In complete census, the MSE is composed of only the non-sampling variance and square of the non-sampling bias. 9
Non-response error:
The non-response error may occur due to refusal by respondents to give information or the sampling units may be
inaccessible. This error arises because the set of units getting excluded may have characteristic so different from the
set of units actually surveyed as to make the results biased. This error is termed as non-response error since it arises
from the exclusion of some of the anticipated units in the sample or population. One way of dealing with the problem
of non-response is to make all efforts to collect information from a sub-sample of the units not responding in the first
attempt.
Measurement and control of errors:
Some suitable methods and adequate procedures for control can be adopted before initiating the main census or
sample survey. Some separate programmes for estimating the different types of non-sampling errors are also
required. Some such procedures are as follows:
1. Consistency checks:
Certain items in the questionnaires can be added which may serve as a check on the quality of collected data. To
locate the doubtful observations, the data can be arranged in increasing order of some basic variable. Then they can
be plotted against each sample unit. Such graph is expected to follow a certain pattern and any deviation from this
pattern would help in spotting the discrepant values.
10
2. Sample check
An independent duplicate census or sample survey can be conducted on a comparatively smaller group by trained
and experienced staff. If the sample is properly designed and if the checking operation is efficiently carried out, it is
possible to detect the presence of non-sampling errors and to get an idea of their magnitude. Such procedure is
termed as method of sample check.
3. Post-census and post-survey checks:
It is a type of sample check in which a sample (or subsample) is selected of the units covered in the census (or
survey) and re-enumerate or re-survey it by using better trained and more experienced survey staff than those
involved in the main investigation. This procedure is called as post-survey check or post-census. The effectiveness of
such check surveys can be increased by
• re-enumerating or re-surveying immediately after the main census to avoid recall error
• taking steps to minimize the conditioning effect that the main survey may have on the work of the check-
survey.
4. External record check:
Take a sample of relevant units from a different source, if available, and to check whether all the units have been
enumerated in the main investigation and whether there are discrepancies between the values when matched. The list
from which the check-sample is drawn for this purpose, need not be a complete one.
11
5. Quality control techniques:
The use of tools of statistical quality control like control chart and acceptance sampling techniques can be used in
assessing the quality of data and in improving the reliability of final results in large scale surveys and census.
6. Study or recall error:
Response errors arise due to various factors like the attitude of respondents towards the survey, method of interview,
skill of the investigators and recall errors. Recall error depends on the length of the reporting period and on the interval
between the reporting period and data of survey. One way of studying recall error is to collect and analyze data
related to more than one reporting period in a sample (or sub-sample) of units covered in the census or survey.
7. Interpenetrating sub-samples:
The use of interpenetrating sub-sample technique helps in providing an appraisal of the quality of information as the
interpenetrating sub-samples can be used to secure information on non-sampling errors such as differences arising
from differential interviewer bias, different methods of eliciting information etc. After the sub-samples have been
surveyed by different groups of investigators and processed by different team of workers at the tabulation stage, a
comparison of the final estimates based on the sub-samples provides a broad check on the quality of the survey
results. 12

Screenshot 2022-12-30 at 4.09.13 PM

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Screenshot 2022-12-30 at 4.09.13 PM

Uploaded by

Copyright:

Available Formats

Sampling Theory

There are two ways of obtaining the information

2. Complete enumeration or census

A sample consists only of a portion of the population units.

A census is a 100% sample and it is a complete count of the population.

All salient features of population are present in the sample.

It goes without saying that every sample is considered as a representative sample.

1. Random sample or probability sample:

The probability of selection of a unit can be equal as well as unequal.

Sampling involves the collection of data on smaller number of units in

Conducting the experiment on smaller number of units, particularly when the

For example, in determining the life of bulbs, it is more feasible to fuse

5. Health and nutrition surveys

9. Public polls and surveys

10. Campus surveys

1. Objective of the survey:

9. Organization of the field work:

11. Information gained for future surveys:

1. Before selection of sampling units

2. At the time of selection of sampling units

3. After the selection of sampling units

1. Physical observations and measurements:

6. Transcription from records:

 The samples can be drawn in two possible ways.

1. Simple random sampling without replacement (SRSWOR)

2. Simple random sampling with replacement (SRSWR)

1. Identify the N units in the population with the numbers 1 to N.

N : Number of sampling units in the population (Population size).

P (u1 , u2 ,..., un ) = P (u1 ).P (u2 ),..., P (un )

P(u1 , u2 ,..., un ) = P(u1 ) . P(u2 )...P(un )

P (selection of u j at k draw ) = P ( A1  A2  ...  Ak −1  Ak )

= P ( A1 ) P ( A2 / A1 ) P ( A3 / A1 A2 )...P ( Ak −1 / A1 , A2 ... Ak − 2 ) P ( Ak / A1 A2 ... Ak −1 )

 N − 1 of the  N  possible samples. So

Thus y is an unbiased estimator of Y .

where Pj (i ) denotes the probability of selection of ith unit at jth stage.

Now we obtain the value of K under SRSWR.

Ignoring fpc will result in the over estimation of variance of y .

V ( y )WR > V ( y )WOR

biasedness for S2 in the cases of SRSWOR and SRSWR.

• under SRSWOR, a possible estimator is

In fact, the σˆ ( y ) is a negatively biased estimator under SRSWOR.

E (ai ) = 1× Probability that the i th unit is included in the sample

Var (YˆT ) = N 2 Var ( y )

and the estimates of variance of YˆT are

When σ 2 is known, then the 100(1 − α )% confidence interval is given by

and minimum sample size needed is

then such requirement can be satisfied by expressing it like

is the required sample size.

properties of sample mean directly.

Using the expression of Var ( y ), the variance of p can be derived as

i.e. p is an unbiased estimator of P.

Confidence interval estimation of P

and the 100(1 − α )% confidence interval of P is

hypergeometric distribution and

equals a predetermined number.

m becomes a random variable.

possess the characteristic.

So the probability distribution function of n can be expressed as.

 In a sample of ( n -1) units   The unit drawn at 

which is obtained by replacing NP by NP – 1, m by (m – 1) and n by (n - 1) in the earlier step.

minor damage and no damage etc.

respectively. Then the probability of observing c1 , c2 ,..., ck is

For estimating the number of units in the ith class,