Unit 4

Sampling: Basic Concepts: Defining the Universe, Concepts of Statistical Population, Sample,
Characteristics of a good sample. Sampling Frame (practical approach for determining the sample frame
expected), Sampling errors, Non Sampling errors, Methods to reduce the errors, Sample Size constraints,
Non Response.
Probability Sample: Simple Random Sample, Systematic Sample, Stratified Random Sample, Area Sampling
& Cluster Sampling. Non Probability Sample: Judgment Sampling, Convenience Sampling, Purposive
Sampling, Quota Sampling & Snowballing Sampling methods. Determining size of the sample – Practical
considerations in sampling and sample size, sample size determination.

A sample is a smaller set of data that a researcher chooses or selects from a larger population using a pre-
defined selection method. These elements are known as sample points, sampling units, or observations.
Creating a sample is an efficient method of conducting research.


Universe is the set all experimental units, from which a sample is to be drawn. Population is the set of all
values of the variables to be studied from those experimental units. Thus, a U-sample contains experimental
units, whereas a P-sample contains data.

I Concepts of Statistical Population

statistics, population is the entire set of items from which you draw data for a statistical study. It can be a
group of individuals, a set of items, etc. It makes up the data pool for a study. Generally, population refers to
the people who live in a particular area at a specific time.

Characteristics of a good sample

A good sample has the following characteristics:

 A sample should have a clear goal. ...
 A good sample should be an accurate representation of the entire universe or population. ...
 A good sample is free from bias. ...
 A sample should be chosen randomly. ...
 The adequacy of a sample is essential. ...
 A sample should be proportional.


Before selecting the sample, the population must be divided into parts called sampling
units or simply sample units.
A sampling frame is the collection of all of the sampling units .
The sampling frame is the list from which units are drawn for the sample. The 'list'
may be an actual listing of units, as in a phone book from which phone numbers will
be sampled, or some other description of the population, such as a map from which areas will be


 Sampling Errors→ Generally, marketing research studies are based on samples of people or products.
The results emerging from such studies are then generalized, which is applied to the entire population.
For example, if a study is done amongst Maruti Car owners in a city to know their average monthly
expenditure on the maintenance of their car, it can be done either by covering all Maruti car owners
residing in that city or by choosing a sample, say 10%, of the total Maruti car owners.
In case of sample taken, the study may give a different average than the actual average if the entire
population is covered. So, the difference between the sample value and the corresponding population
value is known as the sampling error. This error decreases with the increase in the size of the sample.
Therefore, it is usually negligible when a sample of sufficient size from the relevant population has been
taken. In complete enumeration or census survey, sampling errors are not present because sampling is not
being done.

 Non-Sampling Errors→ Non-sampling errors, as the name implies, are all those errors, which occurs in
different stages of research except in the selection of sampling. Obviously, these errors are many and
different. A non-sampling error can arise right at the beginning when the problem is defined wrongly. It
can also occur in any of the subsequent stages such as in designing a questionnaire, non-response of the
questionnaire, in the analysis and interpretation of data, etc.

Following are the various types of non-sampling errors:

(a) Defective Problem Definition→ Problem on which research is to be undertaken should be precisely
defined. For example, a study on unemployment must be clear as to the concept of unemployment, the
reference period, the geographic area to be covered, etc. If any of these basic concepts has wrongly
defined, the results of the study would turn out to be wrong.
(b) Defective Population Definition→ If the population is not well-defined and does not fit to the objects
of research study then an error occurs. It means selection of an inappropriate population causes this
error. Suppose a study is undertaken to know the views of employees on incentives offered by a
company. The study defines its population as male employees and data collected from them. The
exclusion of female employees would be a source of error.
(c) Surrogate (Substitute) Information Error→ This type of error occurs when the information sought
(required/wanted) by the researcher is different from the information needed to solve the problem. For
example, when price of a brand is taken to represent its quality. In such a case, it is presumed that
higher the price of the brand, the better is its quality. This may or may not be true.
(d) Non-Response Error→ It occurs when respondents refuse to give any information and refuse to
cooperate with the interviewer by not answering his questions. It also occurs when respondents are
away from home when the interviewer calls on them. In case of mail survey, the extent of non-response
is usually high.
(e) Response Error→ This is caused when the information collected is different from the information
required due to giving wrong response by the respondents. For example, respondents are asked to
indicate whether they own a color television set. Some of them may respond YES just to boost
(heighten/increase) their image before an interviewer, even though they may not be owning a color
television set. Such responses will result in a response error.
(f) Poor Questionnaire Design→ A questionnaire is an instrument to collect data from respondents in a
survey. If the questionnaire is defective, the data collected on that basis will be misleading. For
example, if one or more questions are wrongly worded conveying a different meaning, wrong data will
be collected through responses to such questions.
(g) Interviewer Bias→ This error occurs on account of interviewer’s influence in conducting an interview
or wrong recording by him. By putting emphasis on a certain word or phrase in a questionnaire,
interviewers can influence respondents to answer in a particular way.
(h) Data Processing Error→ After the data have been collected, they are to be processed. This involves
coding the responses, recording the codes, etc., so that data collection can be transformed into suitable
tables. Mistakes can occur during the processing stage of data.
(i) Data Analysis Error→ Errors can occur on account of wrong analysis of data. Apart from simple
mistakes in summation, division, etc., more complex errors can occur. For example, the application of a
wrong statistical technique can cause such errors.
(j) Interpretation Error→ Sometimes wrong interpretation of data can cause this type of error. In order
to support a particular line of action, the researcher may deliberately misinterpret data. Wrong
interpretation of data may occur at times without one being aware of it.

Methods to reduce the Sampling Error

 Increase sample size: A larger sample size results in a more accurate result because the study gets closer
to the actual population size.
 Divide the population into groups: Test groups according to their size in the population instead of a
random sample. For example, if people of a specific demographic make up 20% of the population, make
sure that your study is made up of this variable to reduce sampling bias.
 Know your population: Study your population and understand its demographic mix. Know what
demographics use your product and service and ensure you only target the sample that matters.

Minimizing Non-Sampling Error

 Send Reminders to the respondents

 Ensure Confidentiality
 Thoroughly Pretest your Survey Mediums
 Accuracy of Data Entry
Sample Size constraints

1. Effects of Small Sample Size

In the formula, the sample size is directly proportional to Z-score and inversely proportional to the margin of
error. Consequently, reducing the sample size reduces the confidence level of the study, which is related to
the Z-score. Decreasing the sample size also increases the margin of error.

A small sample size also affects the reliability of a survey's results because it leads to a higher variability,
which may lead to bias. The most common case of bias is a result of non-response. Non-response occurs
when some subjects do not have the opportunity to participate in the survey.

A small sample size may make it difficult to determine if a particular outcome is a true finding and in some
cases a type II error may occur, i.e., the null hypothesis is incorrectly accepted and no difference between the
study groups is reported.

2. Effects of Large Sample Size

No doubt, larger sample sizes provide more accurate mean values, identify outliers that could skew the data
in a smaller sample and provide a smaller margin of error.
But the use of a larger number of cases can also involve more financial and human resources than necessary
to obtain the desired response.
In addition to this factor, there is another noteworthy issue that has to do with statistics.

3. Non- Response

The survey is poorly designed and leads to nonresponses. For example, excessively long surveys without
incentives may cause a large percentage of people to not complete the survey.
Due to effect of non-response it reduce the sample size, the precision of estimates will be smaller. The
margin of error will be larger.


Non-Probability Probability
Sampling Sampling

Convenience Sampling Simple Random Sampling

Judgment Sampling Systematic Sampling

Quota Stratified
Sampling Sampling

Snowball Cluster
Sampling Sampling



1. Non-Probability Sampling→ It is also known as deliberate or purposive sampling. The non-probability

sampling method is a technique in which the researcher selects the sample based on subjective judgment
rather than the random selection. In this method, not all the members of the population have a chance to
participate in the study.
In such a design, personal element has a great chance of entering into the selection of the sample. Thus,
there is always the danger of bias entering into this type of sampling technique. It includes following

a) Convenience Sampling→ In a convenience sampling method, the samples are selected from the
population directly because they are conveniently available for the researcher. The samples are easy
to select, and the researcher did not choose the sample that outlines the entire population.

For example, You are researching opinions about student support services in your university, so after
each of your classes, you ask your fellow students to complete a survey on the topic. This is a convenient
way to gather data, but as you only surveyed students taking the same classes as you at the same level,
the sample is not representative of all the students at your university.
b) Judgment Sampling→ In purposive sampling, the samples are selected only based on the researcher’s
knowledge. As their knowledge is instrumental in creating the samples, there are the chances of
obtaining highly accurate answers with a minimum marginal error. It is also known as judgmental
sampling or authoritative sampling.

For instance, when researchers want to understand the thought process of people interested in studying for
their master’s degree. The selection criteria will be: “Are you interested in doing your masters in …?” and
those who respond with a “No” are excluded from the sample.
This approach has been found empirically to produce unsatisfactory results. And, of course, there is no
objective way of evaluating the accuracy of sample results. Despite these limitations, this method may be
useful when the total sample size is extremely small.
c) Quota Sampling→ One of the most commonly used non-probability sampling is quota sampling.
Under this sampling, the researcher first make the groups for all items in the universe generally on
demographic basis like age, gender, income, etc., and then he allotted fixed number of items to each
group, called quotas, to be selected from the universe. Field workers are then instructed to conduct
interviews with the designated quotas. It means that the individual respondents from each quota will be
selected by the field worker deliberately.

For example, a food manufacturer wished to sample current users of the company’s brand to obtain
their reactions to proposed new packaging. A quota sample of brand users, divided by age within
gender, was designed with the following quotas:
Brand Users Quota
Men, 18 – 34 50
Men, 35 – 49 50
Women, 18 – 34 100
Women, 35 – 49 100
d) Snowball Sampling→ In this method, a set of respondents are selected initially and interviewed.
After this, these respondents are asked to list the names of other people who in their opinion are a part
of the target population or sample. Thus, it is like setting the ball in motion whereby referrals are
obtained from referrals, thus creating a snow ball effect which keeps on growing in size as it rolls
down. This technique has the advantage of locating right people with the desired characteristics at a
low cost.

For example, surveys to gather information about HIV Aids. Not many victims will readily respond to
the questions. Still, researchers can contact people they might know or volunteers associated with the
cause to get in touch with the victims and collect information.

2. Probability Sampling→ It is also known as “random sampling’ or ‘chance sampling’. Under this
sampling design, every item of the universe has an equal chance of inclusion in the sample. It is
considered as the best technique of selection of a sample because the sample will have the same
composition & characteristics as the universe have. It includes following sampling methods:-
a) Simple Random Sampling→ This is the simplest and most popular technique of sampling. In it each
unit of the population has equal chance of being included in the sample. The method of simple random
sampling eliminates the chance of bias or personal preference in the selection of units.
Suppose we want to select a simple random sample of 200 students from a school. Here, we can assign
a number to every student in the school database from 1 to 500 and use a random number generator to
select a sample of 200 numbers.
b) Systematic Sampling→ In some instances, the most practical way of sampling is to select every nth item
on a list.  The items are selected from the target population by selecting the random selection point and
selecting the other methods after a fixed sample interval. It is calculated by dividing the total population
size by the desired population size.
Suppose the names of 300 students of a school are sorted in the reverse alphabetical order. To select a
sample in a systematic sampling method, we have to choose some 15 students by randomly selecting a
starting number, say 5.  From number 5 onwards, will select every 15th person from the sorted list.
Finally, we can end up with a sample of some students.
c) Stratified Random Sampling→ If a population from which a sample is to be drawn does not
constitute a homogenous group, stratified sampling technique is generally applied in order to obtain a
representative sample. Under this sampling, the population is divided into several sub-populations that
are individually more homogenous than the total population (the different sub-populations are called
‘strata’) and then we select items from each stratum to make a sample. Since each stratum is more
homogenous that the total population, we are able to get more accurate estimates for each stratum.
For example, a researcher wants to have the views of car holders of Tata & Maruti. For this, he wants
to have 100 samples of 50 for Tata car holders and 50 for Maruti car holders. But there are several
models of Maruti car as well as of Tata car. So, he may make the strata of various models of Maruti &
Tata cars in the following way and from each stratum he can select the car holders randomly.
Stratum Car holders
Zen 15
Alto 15
Wagon R 15
Swift 05 _______
Sub Total 50
Indica 35
Indigo 15 _______
Sub Total 50
Total 100
d) Cluster Sampling→ In case of large universe, the whole population is divided into a number of
smaller non-overlapping groups called clusters, and then instead of selecting items from each cluster,
we randomly select those clusters or groups. In other words, in this sampling, the total population is
divided into a number of relatively small subdivisions called clusters and then some of these clusters
are randomly selected for inclusion in the overall sample.
For example, we want to estimate the proportion of machine parts in an inventory which are defective.
Also assume that there are 10,000 machine parts in the inventory at a given point of time, stored in 200
cases of 50 each. Now using a cluster sampling, we would consider the 200 cases as clusters and
randomly select ‘n’ number of cases and examine all the machine parts in each randomly selected case.
e) Area Sampling→ If cluster happen to be some geographic subdivisions, in that case cluster sampling is
better known as area sampling. In other words, cluster samples, where the primary sampling unit
represents a cluster of units based on geographic area, are distinguished as area sampling.
For example, if a researcher want to have the views of consumers in most populated cities in India,
then he can select some populated cities from Delhi, Mumbai, Kolkata, Chennai, Bangalore,
Hyderabad, etc. Similarly, if he wants from hilly states, then he chooses Jammu & Kashmir, Himachal
Pradesh, Uttrakhand, Sikkim, Nagaland, Assam, etc. In case of most rainy states he can choose from –
Maharastra, Meghalaya, Kerala, etc.
f) Multi-Stage Sampling→ Multi-stage sampling is a further development of the cluster sampling.
Sometimes, sampling is done in stages to reduce the cost of survey & saves the time. Basically, this
method is applied in big inquires extending to a large geographical areas, say, the entire country. In this
method, the population is divided into first stage sampling and then a random sample of first stage unit
is selected. Further division of the first stage sampling is selected and a random sample is then again
taken from this second stage sampling. This process can be continued for a number of stages called as
Multi-Stage random sampling.
For example, To investigate the working efficiency of nationalized banks in India and we want to take
a sample of few banks for this purpose.
I Stage:- Select certain states randomly from all Indian states.
II Stage:- Select certain districts randomly from selected states in I stage.
III Stage:- Select certain towns randomly from selected districts in II stage.
IV Stage:- Select certain banks randomly from selected towns in III stage.
Determining size of the sample

‘Sample size’ is a market research term used for defining the number of individuals included to conduct
research. Researchers choose their sample based on demographics, such as age, gender, or physical location.
Samples can be vague or specific.

What size sample do I need?

The answer to this question is influenced by a number of factors, including the purpose of the study,
population size, the risk of selecting a "bad" sample, and the allowable sampling error.
Three criteria will need to be specified to determine the appropriate sample size:
 The level of precision (refers to how close the sample estimate is to the true population)
 The level of confidence or risk (reflects the level of certainty that the sample estimates will actually hold true
for the population)
 The degree of variability in the attributes being measured

Optimum sample size determination is required for the following reasons:

 To allow for appropriate analysis
 To provide the desired level of accuracy
 To allow validity of significance test
Basic principles

Determining the sample size depends on many things. It requires much more thought than any theoretical
discussion portrays.
 What is being sampled?
 How the sample is taken?
 The cost of the sampling
 The timing of when an answer is needed
 The situation and reliability necessary
 The consequences of making a wrong decision
 The theory

Terms used around the sample size

Population size: Population size is how many people fit your demographic. For example, you want to get
information on doctors residing in North America. Your population size is the total number of doctors in
North America. Don’t worry! Your population size doesn’t always have to be that big. Smaller population
sizes can still give you accurate results as long as you know who you’re trying to represent.

Confidence level: Confidence level tells you how sure you can be that your data is accurate. It is expressed
as a percentage and aligned to the confidence interval. For example, if your confidence level is 90%, your
results will most likely be 90% accurate.
The margin of error (confidence interval): When it comes to surveys, there’s no way to be 100% accurate.
Confidence intervals tell you how far off from the population means you’re willing to allow your data to fall.
A margin of error describes how close you can reasonably expect a survey result to fall relative to the real
population value. Remember, if you need help with this information you can use our margin of error

Standard deviation: Standard deviation is the measure of the dispersion of a data set from its mean. It
measures the absolute variability of a distribution. The higher the dispersion or variability, the greater the
standard deviation and the greater the magnitude of the deviation. For example, you have already sent out
your survey. How much variance do you expect in your responses? That variation in response is the standard
of deviation.

Sample size determination

To determine the sample size using a sample calculation formula known as the Andrew Fisher’s Formula.
 Determine the population size (if known).
 Determine the confidence interval.
 Determine the confidence level.
 Determine the standard deviation (a standard deviation of 0.5 is a safe choice where the figure is
 Convert the confidence level into a Z-Score. The table shows the z-scores for the most common
confidence levels.
 Put these figures into the sample size formula to get your sample size.

Confidence level z-score

80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58

Here is an example calculation:

Say you choose to work with a 95% confidence level, a standard deviation of 0.5, and a confidence interval
(margin of error) of ± 5%, you just need to substitute the values in the formula:
= ((1.96)2 x .5(.5)) / (.05)2
= (3.8416 x .25) / .0025

= .9604 / .0025

= 384.16

Your sample size should be 385.

