1.1.1. Meaning and Definition of Sampling Design

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 22

TOTAL WORDS RED COLOUR PORTION = PAYMENT WORDS

11,598 154 = 11,144 (Words for Payment Purpose)


FIGURES ARE NOT COUNTED IN COPY EDITING.

1.1. SAMPLING DESIGN


1.1.1.

Meaning and Definition of Sampling Design

A process used in statistical analysis in which a predetermined number of observations will be taken from a
larger population is called sampling.
Sampling is essential technique of behavioural research; the research work cannot be undertaken without use of
sampling. The study of the total population is not possible and it is also impracticable. The practical limitation
cost, time and other factors which are usually operative in the situation, stand in the way of studying the total
population. The concept of sampling has been introduced with a view to making the research findings economical
and accurate. For example, a fruit merchant does not examine each and every apple or mango. He inspects only a
few of them and takes decision to purchase or not to purchase. The most important aim of sampling is to obtain
maximum information about the population under study with the least uses of money, labour, and time.
A sample design is a definite plan for obtaining a sample from the sampling frame. Sampling design, in
general, refers to the method or technique the researcher adopts in selecting the sampling units from the frame
or population.
A sample design is the framework, or road map, that serves as the basis for the selection of a survey sample and
affects many other important aspects of a survey as well. In a broad context, survey researchers are interested in
obtaining some type of information through a survey for some population, or universe, of interest. One must
define a sampling frame that represents the population of interest, from which a sample is to be drawn. The
sampling frame may be identical to the population, or it may be only part of it and is therefore subject to some
undercoverage, or it may have an indirect relationship to the population (e. g. the population is preschool
children and the frame is a listing of preschools). The sample design provides the basic plan and methodology
for selecting the sample. A sample design can be simple or complex.

1.1.2.

Basic Concepts of Sampling Design

There are some basic terms which are necessary to know the concept of sampling design:
1) Universe/Population: The population or universe represents the entire group of units which is the focus of
the study. Thus, the population could consist of all the persons in the
Define the Universe
country, or those in a particular geographical location, or a special
indigenous or economic group, depending on the purpose and coverage
of the study. A population could also consist on non-human units such
Sample Frame
as farms, houses or business establishments. For example, if an
investigation is to be conducted on the marks obtained in statistics by
Specifying Sampling Units
the students of a class then all the students of that class in that subject
will be the Universe. If that class consists of 50 students, the same 50
Selection of Sample Design
students will form the Universe.
An aggregate of objects (animate or inanimate) under study is called
population or universe. It is thus a collection of individuals or of their
attributes (qualities) or of results of operations which can be
numerically specified.
i) Finite Universe: A universe having a finite number of entities or
members is called a finite universe. For example, the universe of
the weights of students in a particular class or the universe of
smokers in Rohtak district.

Determination of Sample Size


Select the Sample
Figure 3.1: Sampling Design Process

ii) Infinite Universe: A universe with infinite number of members is known as an infinite universe. For
example, the universe of pressures at various points in the atmosphere.

2) Statistical Population: A statistical population is a set of entities concerning which statistical inferences
are to be drawn, often based on a random sample taken from the population. For example, if we were
interested in generalisations about crows, then we would describe the set of crows that is of interest.
Notice that if we choose a population like all crows, we will be limited to observing crows that exist now or
will exist in the future. Probably, geography will also constitute a limitation in that our resources for
studying crows are also limited.
A statistical population is an aggregate of measurable quantities or a set of numbers. In fact, when every
element of such a set is characterised by only one character, say, income of individuals, we have a univariate
population. It should be noted that a statistical population can be finite or infinite according as it contains a
finite or an infinite number of elements. Again, any arbitrary set is not necessarily a statistical population. For
example, the set of cows in a farm at a particular time, does not represent a statistical population.
3) Samples: Sample is a portion of the population which is examined with a view to estimating the
characteristics of the population, for example
i) To assess the quality of a bag of rice, we examine only a portion of it. The portion selected from the
bag is called a sample, while the whole quantity of rice in the bag is the population,
ii) To estimate the proportion of defective articles in a large consignment, only a portion (i.e., a few of
them) is selected and examined. The selected portion is a sample.
4) Sampling Frame: A sampling frame is the actual set of units from which a sample will be drawn. It is a list
that contains every member of the population from which a sample will be selected. For example, if we
wish to study the underlying factors that cause patients to be admitted into hospital following an acute
asthmatic attack in a given area (your population), then you would need to know the names of all the people
in that area who have been admitted into hospital for this reason.
A good sampling frame should be:
i) Relevant: It should contain things directly linked to the research topic.
ii) Complete: It should cover all relevant items.
iii) Precise: It should exclude all the items that are not relevant.
iv) Up-to-Date: It should incorporate recent additions and changes, and have redundant items cleansed
from the list.

1.1.3.

Sampling Design Process

A sampling plan is a detailed outline of which measurements will be taken at what times, on which material, in
what manner, and by whom. Sampling plans should be designed in such a way that the resulting data will
contain a representative sample of the parameters of interest and allow for all questions, as stated in the goals,
to be answered. The steps involved in developing a sampling plan are:
1) Define the Universe: Universe can be confined to a particular type of product, some geographical limits or
some other constraints.
The first problem in any sampling procedure is to define the universe. The target population or universe is
the collection of elements or objects that possess the information sought by the researcher and about which
inferences are to be made. The target population must be defined precisely. Imprecise definition of the
target population will result in research that is ineffective at best and misleading at worst. Defining the
target population involves translating the problem definition into a precise statement of who should and
should not be included in the sample.
The target population should be defined in terms of elements, sampling units, extent and time. An element
is the object about which or from which the information is desired. In survey research, the element is
usually the respondent.
For example, consider a marketing research project assessing consumer response to a new brand of mens
cologne. Who should be included in the target population? All men? Men who have used cologne during the
last month? Men 17 or older? Should females be included, because some women buy colognes for their
husbands? These and similar questions must be resolved before the target population can be appropriately
defined.

2) Sample Frame: The frame is constructed either by the researcher for the purpose of his study or may
consist of some existing list of the population.
After the population to be studied has been specified, the next step is to develop a frame of this population.
A list containing all sampling units of a population is known as sampling frame. The frame is constructed
either by the researcher for the purpose of his study or may consist of some existing list of the population. A
frame does not always have to be a list of names; it can also involve a definite location, a boundary, an
address, or a set of rules by which a sampling unit can be delineated.
A frame in some sense is a set of boundaries circumscribing the universe. It may be in the form of lists,
indices, maps, directories, population records, electoral rolls, city tax rolls, students enrolled in a university
etc. In marketing studies the frame is essential. A list of every element of the population appearing once and
only once would constitute a sample frame. A good sampling frame should be accurate, free from
duplication and conveniently available. A sample frame is essential for marketing research and better
performance of sampling procedure.
A sampling frame is a representation of the elements of the target population. It consists of a list or set of
directions for identifying the target population. For example, the telephone book, an association directory
listing the firms in an industry, a mailing list purchased from a commercial organisation, a city directory, or
a map.
3) Specifying the Sampling Units: The decision on sampling unit often depends on the sampling frame. The
sampling unit is the basic unit containing the elements of the population to be sampled, e.g. city blocks,
households, a business organisation etc. The selection of the sampling unit partially depends on the overall
design of the project also. The units which serve as the basis of initial sampling are known as primary
sampling units. It can be composed of one or more units of the population depending on the objectives of
the inquiry.
For example, suppose that Revlon wanted to assess consumer response to a new line of lipsticks and
wanted to sample females over 18 years of age. It may be possible to sample females over 18 directly, in
which case a sampling unit would be the same as an element. Alternatively, the sampling unit might be
households. In the latter case, households would be sampled and all females over 18 in each selected
household would be interviewed. Here, the sampling unit and the population element are different. Extent
refers to the geographical boundaries and the time factor is the time period under consideration.
4) Selection of Sample Design: It is the procedure of selecting units in the sample. There are two basic methods
of sampling namely, probability and non-probability methods which can be further divided into some
specific methods of selection. It is the procedure of selecting units in the sample. A probability sample is one,
where the selected units have some specific chance of being included in the sample. In a non-probability
sample some arbitrary method of selection not depending on chance is adopted. This method mainly depends
on the purpose of the inquiry, as well as on the attitude or convenience of the investigators.
The selection of the sample design really involves two decisions:
i) To use probability or non-probability method of selection, and
ii) Specific sample design to use in collecting the data.
The researchers choice will be affected by the following considerations:
i) If sampling error is to be evaluated, then probability sampling must be used.
ii) To ensure randomness in the selection of units, probability sample should be used.
iii) In the absence of proper sample frame, non-probability sampling should be used.
iv) If time and money considerations are vital, then non-probability sampling should be used.
Once the decision about probability and non-probability method of selection has been made, one should select
the sample design that will best accomplish the objectives of the investigation. Regardless of the design finally
chosen, the researcher may have to defend this design, when the study results are ultimately presented.
5) Determination of Sample Size: The size of the sample has direct relationship with degree of accuracy
desired in the investigation. It also depends upon the nature of the population as well as the method of
selection. In marketing research investigations the ideal sample size depends upon the type of the series and

the size of the population. It is a common practice that larger the size of the population, more units should
be drawn in the sample and more the degree of heterogeneity, larger should be sample size for it to be
representative.
6) Select the Sample: Select the sample means execute actual sampling process. It is the actual selection of
the sample elements. This requires a substantial amount of office and field work, particularly when personal
interviews are involved. Execution of the sampling process requires a detailed specification of how the
sampling design decisions with respect to the population, sampling frame, sampling unit, sampling
technique, and sample size are to be implemented. If households are the sampling unit, an operational
definition of a household is needed. Procedures should be specified for vacant housing units and for call
backs in case no one is at home. Detailed information must be provided for all sampling design decisions.

1.1.4.

Characteristics of a Good Sample

The various characteristics of a good sample are as follows:


Characteristics of a Good Sample
True Representative

Free from Bias

Objective
Comprehensive
Approachable
Feasible

Accurate
Economical
Good Size
Practical

1) True Representative: A good sample is the true representative of the population corresponding to its
properties. The population is known an aggregate of certain properties and sample is called sub-aggregate
of the universe.
2) Free from Bias: A good sample is free from bias. It does not permit prejudices, pre-conceptions and
imaginations to influence its choice.
3) Objective: A good sample is an objective one. It refers to objectivity in selecting procedure or absence of
subjective elements from the situation.
4) Accurate: A good sample maintains accuracy. It yields accurate estimates or statistics and does not involve
errors.
5) Comprehensive: A good sample is comprehensive in nature. This is closely linked with true
representativeness. A comprehensive sample is controlled by specific purpose of the investigation. A sample
may be comprehensive in traits but may not be a good representative of the population.
6) Economical: A good sample is economical from energy, time and money point of view.
7) Approachable: The subjects of good sample are easily approachable. The research tools can be easily
administered on them and data can be easily collected.
8) Good Size: The size of good sample is such that it yields an accurate result. The probability of error can be
estimated.
9) Feasible: A good sample makes the research work more feasible.
10) Practical: A good sample has the practicability for research situation.

1.1.5.

Uses of Sampling

The main use sampling is as follows:


1) A large population can be satisfactorily covered through sampling.
2) Sampling saves a lot of energy, money and time.

3)
4)
5)
6)

When the data are unlimited, this method becomes very handy.
Sampling technique is very useful when the units are relatively homogeneous.
The use of this method becomes inevitable when 100% accuracy is not required.
Sampling makes intensive study possible when the number of individuals to be studies is manageable.

1.1.6.

Advantages of Sampling

The advantages of sampling are as follows:


1) Saves Time, Money and Effort: The researcher can save time, money and effort because the subjects
involved are small in number giving him a short time to calculate, tabulate, present, analyse, and interpret.
2) More Effective: As the size of sample is less than that of population, fatigue in collecting the information is
reduced and therefore more effective work is done by the investigators.
3) Faster and Cheaper: Since the sample is small, the collection, tabulation, presentation, analysis, and
interpretation of data are rapid and less expense is involved.
4) More Accurate: Fewer errors are made because small data are involved in collection, tabulation,
presentation, analysis and interpretation.
5) Gives More Comprehensive Information: A small sample results in a more thorough investigation of the
study, thus, giving more comprehensive information because all the members of the population have been
given an equal chance of being included in the sample.

1.1.7.

Disadvantages of Sampling

Disadvantages of sampling are as follows:


1) Biased Selection: Sampling may involve biased selection of respondents on the part of the research worker.
2) Difficulty in Selection: Selection of a truly representative sample is very difficult. A large number of
factors stand in the way of selecting good samples.
3) Specialised Knowledge Needed: Sampling method needs a specialised knowledge and in its absence, the
investigators may commit serious mistakes.
4) Problem if Cooperation: The subjects if the sample may be widely dispersed. Some of them may even
refuse to cooperate with the researcher.
5) Less Accuracy: Sampling is not suitable where a higher standard of accuracy is expected.
6) Limited Nature: Sometimes the universe is so small or heterogeneous that it is not possible to derive a
representative sample. In such a situation census study is the only alternative.

1.2. TYPES OF SAMPLE DESIGNS


There are different types of sample designs based on two factors viz., the representation basis and the element
selection technique. On the representation basis, the sample may be probability sampling or it may be nonprobability sampling. Probability sampling is based on the concept of random selection, whereas nonprobability sampling is non-random sampling. On element selection basis, the sample may be either
unrestricted or restricted. When each sample element is drawn individually from the population at large, then
the sample so drawn is known as unrestricted sample, whereas all other forms of sampling are covered under
the term restricted sampling.
Thus, sample designs are basically of two types viz., probability sampling and non-probability sampling as
shown in figure below:
Sampling Design

Probability Sampling Design

Non-Probability Sampling Design

1.2.1.

Probability Sampling Design

Probability sampling methods are such methods of selecting a sample from the population in which all units of
the universe are given equal chances of being included in the sample The results obtained from probability or
random sampling can be assured in terms of probability i.e., one can measure the errors of estimation or the
significance of results obtained from a random sample, and this fact brings out the superiority of random
sampling design over the deliberate sampling design.

1.2.1.1.

Types of Probability Sampling Design

The various probability sampling designs are as follows:


Types of Probability Sampling Design
Simple Random Sampling
Stratified Random Sampling
Multi-Stage Sampling

1.2.1.2.

Systematic Sampling
Cluster Sampling
Area Sampling

Simple Random Sampling

This is the simplest and most popular technique of sampling. In it each unit of the population has equal chance
of being included in the sample. This method implies that if N is the size of the population and n units are to be
drawn in the sample, then the sample should be taken in such a way that each of the NCn samples has an equal
chance of being selected. Simple random sampling gives:
1) Each element in the population an equal chance of being included in the sample and all choices are
independent of each other.
2) Each possible sample combination an equal chance of being chosen.
The method of simple random sampling eliminates the chance of bias or personal prejudices in the selection of
units.
Methods of Simple Random Sampling
Some of the common methods of drawing simple random samples are:
1) Lottery Method: The following are the steps in this method. Let N be the size of population and a sample
of size n is to be drawn.
i) Number the units of the population from 1 to N.
ii) Take N identical pieces of papers, cards, capsules or balls and number these from 1 to N.
iii) Mix these thoroughly in a bag, bowl or some other container and pick out n items either one by one or
in one stroke blind folded.
iv) The number of units in the population bearing the numbers on the items drawn in step will constitute
the desired random sample.
Though the method is simple and easy to apply, but incase the size of the population is infinite, the method
becomes unmanageable.
2) By Using Random Numbers: Some experts have constructed random number tables. These tables help in
selection of a sample. Off all such various tables Tippetts Tables are most famous and are in use. These
numbers can be used to select random samples from a given population. A table of true random numbers is
one in which any digit from 0 to 9 have an equal chance of appearing in any position of the table. Random
numbers are most useful when population is of large size. The random number table contains many
columns and rows of which one is selected randomly and then continuously the desired size of samples is
selected sequentially form this random number table. This helps in providing a set of random numbers
without any bias.

Advantages of Simple Random Sampling


1) Freedom from Bias: Freedom from human bias and classification error remains one of the biggest
advantages simple random sampling offers, as it gives each member of a population a fair chance of being
selected.
2) Representativeness: If done right, simple random sampling results in a sample highly representative of the
population of interest. In theory, if a researcher has access to all the necessary data about a given
population, only bad luck can compromise his samples representativeness.
3) Ease of Sampling and Analysis: Other sampling methods require much in-depth research and advance
knowledge of a population prior to the selection of subjects. In simple random sampling, only the complete
listing of the elements in a population (known as the sampling frame) is needed. A simple random sample,
being highly representative of a population, also simplifies data interpretation and analysis of results.
Disadvantages of Simple Random Sampling
1) Cost: One factor limiting the use of simple random sampling is the cost. Because the method guarantees
that every possible item in the universe has the same chance of being chosen, the actual sample selected
often consists of universe items that are widely dispersed geographically. If personal interviews are used,
interviewers may have to travel considerable distances, thereby increasing the costs of the field operation.
2) Availability of a Current Listing of Universe Elements: A second serious limitation to practical use of
simple random sampling is the need for an accurate list of universe elements.
3) Statistical Efficiency: A third difficulty associated with simple random sampling is that it is often possible
to get statistically more efficient sample. One sample design is said to be statistically more efficient than
another when, for the same size sample, a smaller standard error is obtained.
4) Administrative Difficulties: A number of difficulties are associated with the administration of simple
random samples. One is the conceptually simple, but sometimes troublesome, problem of selecting the
sample. Another administrative problem in simple random sampling is the difficulty of maintaining
supervisory control when using in-home personal interviews.

1.2.1.3.

Systematic Sampling

In this sampling, one unit is selected at random from the universe and the other units are at a specified interval
from the selected unit. This method can be used when the population is finite and the units of the Universe can
be arranged on the basis of any system like alphabetical arrangement, numerical arrangement or geographical
arrangement etc.
Advantages of Systematic Sampling
1) Simple and Convenient: This system is very simple and convenient and results obtained by it are generally
satisfactory.
2) Gives Similar Results: If the population is large systematic sampling would give us results which will be
similar to those obtained by proportionate stratified random sampling.
3) Independent: This technique of selection is independent of the property of the universe which is under
study and so it gives quite a representative sample.
4) Little Chance of Biasness: There is a little chance of bias creeping into the sample.
5) Helps in Random Selection: It is quite popular in the random draw of prizes and selection of a contesting
candidate in a tie position.
Disadvantages of Systematic Sampling
1) High Sampling Error: The systematic sampling may select only one cluster of the population although
there may be several clusters in the universe. The sampling error, therefore, may be very high and cannot be
evaluated properly.
2) Possibility of Selecting Impracticable Units: There may be a possibility of selecting impracticable units
of the population. In a practical situation, it may be easy to determine whether a periodicity is present or to
evaluate its significance.

3) Biased: The selection may be affected by bias of the drawer if all the chits are not folded identically and are
not of identical size, shape and colour.
4) Not Suitable for Large Population: It is not suitable for a large sized universe as it will be very difficult to
write down the names or particulars of a large number of units on the chits of paper.

1.2.1.4.

Stratified Random Sampling

Stratified random sample is one in which random selection is done not from the heterogeneous universe as a
whole but from different homogeneous parts or strata of a universe.
This sampling procedure may be summarised as follows:
1) The universe to be sampled is divided (or stratified) into groups that are mutually exclusive and include all
items in the universe.
2) A simple random sample is then chosen independently from each group or stratum.
The process of stratified random sampling differs from simple random sampling in that, with the later, sample
items are chosen at random from the entire universe. In stratified random sampling, the sample is designed so
that a separate random sample is chosen from each stratum. In simple random sampling the distribution of the
sample among strata is left entirely to chance.
Formally, divide the population into non-overlapping groups (i.e., strata)

N1, N 2, .............N i
Such that N1, N 2, ............. N i N
Then do a random sample of f

n
in each strata where f is the sampling fraction.
N

For example, suppose a researcher wishes to study retail sales of a product such as wheat in a universe of
100,000 grocery stores. The researcher might first subdivide this universe into three strata, based on store size,
as illustrated below:
THIS RED PART CONTAIN 50 WORDS WHICH WILL BE NOT COUNTED FOR PAYMENT PURPOSE.
I.E., NO CHANGES IS REQUIRED FROM YOUR SIDE IN THIS COLOUR PORTION
Store Size Stratum
Large stores
Medium stores
Small stores
Total

Number of Stores
20,000
30,000
50,000
1,00,000

Percentage of stores
20
30
50
100

Then, by random sampling independently within each of the three strata, the researcher could guarantee the
desired sample allocation of stores within each size group-instead of leaving their representation to chance.
For example, suppose it was desirable to have total samples of 120 stores, with the stratification scheme
shown, the researcher would simply choose, simple random sampling from each strata, would be expected to
yield about 24 large stores (20 percent of 120); about 36 medium stores (30 percent of 120); and about 60 small
stores (50 percent of 120). Thus all the strata would get representation in the strata. Instead in case of simple
random sampling, as equal chances of occurrences are given to all sample points, it may happen that one of the
characteristic of small strata may not get any representation at all.
Advantages of Stratified Probability Sampling
1) More Representative: This is particularly so when the universe is heterogeneous. By stratifying the
universe, in homogenous groups the element of heterogeneity is reduced in the sense that different strata
take care of the heterogeneity of the universe and the chance of any particular element of heterogeneity
being prominent in the sample is ruled out. When same sampling fractions (f) within the strata then it is
called proportionate stratified random sampling. When different sampling fractions (f) within the strata then
it is called disproportionate stratified random sampling.

2) Certainty: This type of sampling balances the uncertainty of random sampling against bias of deliberate
selection.
3) Greater Precision: Since variability in each stratum is reduced, stratified random sampling provides more
precise estimates than those provided by simple random sampling.
4) Administrative Convenience: Division of the universe into different strata of sub-groups result in
administrative convenience.
Disadvantages of Stratified Probability Sampling
1) Needs More Attention: Stratified sampling designs can be either proportionate or disproportionate. In
proportionate sampling, the sample size is proportional to the stratum size. As a result, there is a higher
precision level which is magnified by a homogeneous population. Disproportionate stratification provides
for varying sample size for each stratum. Criteria used to allocate the strata points will determine whether
the precision of the design is excellent or pitiable. It is best suited for strata with varying characteristics
because it can only optimise the accuracy of one study and this cannot be transferred to subsequent surveys.
2) Time Consuming: The method involves seven steps in coming up with the sample, making it a lengthy
process. It also requires that a record of the population being studied is made available. At times, the list is
not obtainable and developing it makes the work harder since the strata must be collectively and mutually
exclusive. As a result, the sample size is increased, leading to extra expenses and extended time of study.
3) Complicated: Decisions on stratification are made prior to the study. If the choices made are wrong, the
information collected becomes invalid for use in drawing conclusions. Analysing the data is also complex
because you have to consider the number and size of strata population, size of total population and sample
population.
4) Expensive: Design use can call for a large sample size, which increases the cost, especially in cases where
the lists needed are classified and have to be bought. In other instances, the population lists can be
accessible, but the people are geographically dispersed. Necessary arrangements have to be made to reach
them, adding an extra cost.

1.2.1.5.

Cluster Sampling

In this method, the universe is divided into some recognizable sub-groups which are called clusters. After this a
simple random sample of these clusters is drawn and then all the units belonging to the selected clusters
constitute the sample. For example, if we have to conduct an opinion poll in the city of Delhi, then the city may
be divided into, say, 50 blocks and out of these 50 blocks 5 blocks can be picked up by random sampling and
the people in these five blocks can be interviewed to give their opinion on a particular issue.
While using this method, it should be seen that clusters are of as small in size as possible and the number of
sample units in each cluster should be more or less the same. This method is commonly used in collecting data
about some common characteristics of the population.
Cluster sampling, no doubt, reduces cost by concentrating surveys in selected clusters. But certainly it is less
precise than random sampling. There is also not as much information in n observations within a cluster as
there happens to be in n randomly drawn observations. Cluster sampling is used only because of the economic
advantage it possesses; estimates based on cluster samples are usually more reliable per unit cost.
Difference between Stratified and Cluster Sampling
Stratified Sampling

Cluster Sampling

1) One divide the population into a few sub-groups:


i) Each sub-group has many elements in it.
ii) Sub-groups are selected according to some

1) One divide the population into many sub-groups:


i) Each sub-group has few elements in it.
ii) Sub-groups are selected according to some criterion

2)
3)
4)

criterion that is related to the variables under


study.
One tries to secure homogeneity within sub-groups.
One tries to secure heterogeneity between sub-groups.
One randomly choose elements from within each subgroup.

of ease or availability in data collection.

2) One tries to secure heterogeneity within sub-groups.


3) One tries to secure homogeneity between sub-groups.
4) One randomly chooses several sub-groups so that
typically study in depth.

Advantages of Cluster Sampling


1) Cheap, Quick and Easy: This sampling technique is cheap, quick and easy. Instead of sampling an entire
country when using simple random sampling, the researcher can allocate his limited resources to the few
randomly selected clusters or areas when using cluster samples.
2) Larger Sample Size: Related to the first advantage, the researcher can also increase his sample size with
this technique. Considering that the researcher will only have to take the sample from a number of areas or
clusters, he can then select more subjects since they are more accessible.
3) Convenient to Obtain: Clusters are usually convenient to obtain, and the cost of sampling from the entire
population is reduced because the scope of the study is reduced to the clusters.
4) Cost Effective: The cluster sampling method is widely used in marketing research due to its overall costeffectiveness and feasibility of implementation, especially in area sampling situations.
Disadvantages of Cluster Sampling
1) Least Representative: From all the different type of probability sampling, this technique is the least
representative of the population. The tendency of individuals within a cluster is to have similar
characteristics and with a cluster sample, there is a chance that the researcher can have an overrepresented
or underrepresented cluster which can skew the results of the study.
2) High Sampling Error: This is also a probability sampling technique with a possibility of high sampling
error. This is brought by the limited clusters included in the sample leaving off a significant proportion of
the population unsampled.
3) Less Efficient: If the elements of a cluster are similar, cluster sampling may be statistically less efficient
than simple random sampling.
4) Sometimes not Appropriate: When the elements of a cluster are the same, sampling from the cluster may
be any better than sampling a single unit from the cluster.

1.2.1.6.

Multi-Stage Sampling

This is a modified form of cluster sampling. While in cluster sampling all the units in a selected cluster
constitute the sample, in multistage sampling the sample units are selected in two or three or four stages. In this
system the universe is first divided into first-stage sample units, from which the sample is selected. The selected
first-stage samples are then sub-divided into second stage units from which another sample is selected. Third
stage and fourth-stage sampling is done in the same manner if necessary. Thus, for an urban survey, a sample of
towns may be taken first and then for each of the selected town a sub-sample of households may be taken, and
then, if need be, from each of the selected household a third-stage-sample of individuals may be obtained.
Advantages of Multi-Stage Sampling
1) Flexible: It is very flexible as compared to other methods of sampling.
2) Saves Time: In this method, the subsequent stages of samples are needed only for a limited number of units
i.e., for those only which were selected in the preceding stages. As such it saves a lot of time, energy and
cost.
3) Administrative Efficiency: It is easier to administer than most single stage designs mainly because of the
fact that sampling frame under multi-stage sampling is developed in partial units.
4) Helps in Survey of Undeveloped Areas: It is of great utility in surveys of undeveloped areas where no up
to-date and accurate frame is available for subdivision of the materials into reasonably small sample units.
5) Sampling of Large Units: A large number of units can be sampled for a given cost under multistage
sampling because of sequential clustering, whereas this is not possible in most of the simple designs.
Disadvantages of Multi-Stage Sampling
1) Large Number of Errors: It is likely to cause a large number of errors as it involves a process of divisions
and sub-divisions of the various strata or clusters in different stages.
2) Greater Variability: It leads to greater variability of the estimates than any other method of sampling,
3) Less Efficient: In general, it is less efficient than a suitable single stage random sampling.

1.2.1.7.

Area Sampling

Area Sampling is a form of multi-stage sampling in which maps, rather than lists or registers are used as the
sampling frame. It is more frequently used in those countries which do not have a satisfactory sampling frame
such as a population lists.
If clusters happen to be some geographic subdivisions, in that case cluster sampling is better known as area
sampling. In other words, cluster designs, where the primary sampling unit represents a cluster of units based on
geographic area, are distinguished as area sampling. The plus and minus points of cluster sampling are also
applicable to area sampling.
The overall area for sampling is divided into several smaller areas within which a random sample is selected.
For example, the city map is used for area sampling. Various blocks provide the frame and each of them are
numbered and used for the sampling. For sampling blocks stratification is employed, which is based on
geographical considerations. Thus blocks are needed to be identified and then a stratified sample of dwellings
can be selected. Finally blocks are subdivided into segments of a more or less equal size, and a sample of these
segments may be taken in the sample.
Advantages of Area Sampling
1) Convenience: Clusters are usually convenient to obtain, and the cost of sampling from the entire
population is reduced because the scope of the study is reduced to the clusters.
2) Cost: The cost per element is usually lower in area sampling than in stratified sampling because of lower
element listing or locating costs.
3) Feasible: Sometimes area sampling is the only feasible approach because the sampling frames of the
individual elements of the population are unavailable and therefore other random sampling techniques cant
be used.
Disadvantages of Area Sampling
1) Similar Elements: If the elements of a area are similar, area sampling may be statistically less efficient than
simple random sampling. In an extreme case, when the elements of a cluster are the same, sampling from
that area may be no better than sampling a single unit from the area.
2) Costly: The costs and problems of statistical analysis are greater with area sampling than with simple
random sampling.

1.2.1.8.

Advantages of Probability Sampling

Advantages of probability sampling are as follows:


1) Unbiased Estimates: Random (Probability) sampling is the only sampling method that provides essentially
unbiased estimates having measurable precision. If the investigator requires this level of objectivity, then
some variant of probability sampling is essential.
2) Relative Efficiency: Random Sampling permits the researcher to evaluate, in quantitative terms, the
relative efficiency of alternate sampling techniques in a given situation. Usually this is not possible in nonprobability sampling.
3) Less Universe Knowledge Required: This requires relatively little universe knowledge. Essentially, only
two things are needed to be known:
i) A way of identifying each universe element uniquely, and
ii) The total number of universe elements.
4) Fair: Every item in the population has an equal chance of being selected and measured.
5) Easy: It allows easy data analysis and error calculation.

1.2.1.9.

Disadvantages of Probability Sampling

Following are the disadvantages of probability sampling:


1) Less Efficient: It is less statistically efficient than other sampling methods.

2) Non-Utilisation of Additional Knowledge: It does not make use of additional knowledge of how the
population is structured.
3) Complex and Time Consuming: The method of selection in many cases can be complex and time
consuming. Especially in the cases of marketing research, the constraints of budget and time may give
preference to non-probability methods of sampling.
4) High Level Skills: Probability sampling requires a very high level of skill and experience for its use.
5) More Time Required: It requires a lot of time to plan and execute a probability sample.
6) High Costs: The costs involved in probability sampling are generally large as compared to non-probability
sampling.

1.2.2.

Non- Probability Sampling Design

Non- Probability sampling is that sampling procedure which does not afford any basis for estimating the
probability that each item in the population has been included in the sample. Non- probability sampling is also
known by different names such as deliberate sampling, purposive sampling and judgment sampling. In this type
of sampling, items for the sample are selected deliberately by the researcher; his choice concerning the items
remains supreme. In other words, under non-probability sampling the organisers of the inquiry purposively
choose the particulars units of the universe for constituting a sample on the basis that the small mass that they
select out of a huge one will be typical or representative of the whole.

1.2.2.1.

Types of Non- Probability Sampling Design

The various non- probability sampling designs are:


Types of Non- Probability
Sampling Design
Convenience Sampling
Panel Sampling

1.2.2.2.

Purposive Sampling
Snowball Sampling

Convenience Sampling

In convenience sampling selection, the researcher chooses the sampling units on the basis of convenience or
accessibility. It is called accidental samples because the sample-units enter by accident. This is also known as a
sample of the man in the street, i.e., selection of units where they are. Sample units are selected because they
are accessible. For example, in testing a potential new product, the sample work is done by adding the new
product to the appropriate shops in the locality. Purchasing and selling of the new product is observed there.
Advantages of Convenience Sampling
1) Economical: It is less costly and less time consuming.
2) Proper Representation: It ensures proper representation of the universe when the investigation has full
knowledge of the composition of the universe and is free from bias.
3) Avoid Irrelevant Items: It prevents unnecessary and irrelevant items entering into the sample per chance.
4) Intensive Study: It ensures intensive study of the selected items.
5) Accurate Results: It gives better results if the investigator is unbiased and has the capacity of keen
observation and sound judgment.
Disadvantages of Convenience Sampling
1) Personal Bias: There is enough scope for bias or prejudices of the investigate to play and influence the
selection.
2) No Equal Chance: There is no equal chance for all the items of the universe being included in the sample.

3) No Degree of Accuracy: There is no possibility of having any idea about the degree of accuracy achieved
in the investigation conducted by this method.
4) No Possibility of Sample Error: There is no possibility of calculating the sample error the idea of which is
based on the mathematical concepts which are no applicable to non-random methods of sampling.
5) Unsuitable for Large Samples: This method is not suitable for the large samples where the size of both the
universe and the sample is considerably large.

1.2.2.3.

Purposive Sampling

A non-probability sample that conforms to certain criteria is called purposive sampling. Purposive sampling is
basically two types:
1) Judgment Sampling
2) Quota Sampling
1.2.2.3.1. Judgment Sampling
In judgment sampling, the researcher or some other expert uses his/her judgment in selecting the units from
the population for study based on the populations parameters.
This type of sampling technique might be the most appropriate if the population to be studied is difficult to
locate or if some members are thought to be better (more knowledgeable, more willing, etc.) than others to
interview.
For example, a group of sales managers might select a sample of grocery stores in a city that they regarded as
representative. This approach has been found empirically to produce unsatisfactory results. And, of course,
there is no objective way of evaluating the precision of sample results. Despite these limitations, this method
may be useful when the total sample size is extremely small.
Advantages of Judgment Sampling
1) Suitable for Small Sampling Units: When only a small number of sampling units are in the universe,
simple random selection may miss the more important elements, whereas judgment selection would
certainly include them in the sample.
2) Studying Unknown Traits of Population: When we want to study some unknown traits of a population,
some of whose characteristics are known; we may then stratify the population according to these known
properties and select sampling units from each stratum on the basis of judgment. This method is used to
obtain a more representative sample.
3) Solving Everyday Business Problems: In solving everyday business problems and making public policy
decisions, executives and public officials are often pressed for time and cannot wait for probability sample
designs. Judgment sampling is then the only practical method to arrive at solutions to their urgent problems.
Disadvantages of Judgment Sampling
1) Non-Scientific: This method is not scientific because the population units to be sampled may be affected by
the personal prejudice or bias of the investigator. Thus, judgment sampling involves the risk that the
investigator may establish foregone conclusions by including those items in the sample which conform to
his preconceived notions.
2) No Method to Calculate Sampling Error: There is any objective way of evaluating the reliability of
sample results. The success of this method depends upon the excellence in judgment. If the individual
making decisions is knowledgeable about the population and has good judgment, then the resulting sample
may be representative, otherwise the inferences based on the sample may be erroneous. It may be noted that
even if a judgment sample is reasonably representative, there is no objective method for determining the
size or likelihood of sampling error. This is a big defect of the method.
1.2.2.3.2. Quota Sampling
One of the most commonly used non-probability sample designs is quota sampling, which enjoys its most
widespread use in consumer surveys. This sampling method also uses the principle of stratification. As in

stratified random sampling, the researcher begins by constructing strata. Bases for stratification in consumer
surveys are commonly demographic, e.g., age, gender, income and so on. Often compound stratification is used.
For example, age groups within gender.
Next, sample sizes (called quotas) are established for each stratum. As with stratified random sampling, the
sampling within strata may be proportional or disproportional. Field-workers are then instructed to conduct
interviews with the designated quotas, with the identification of individual respondents being left to the fieldworkers.
Advantages of Quota Sampling
1) Economical: It is economical as travelling costs can be reduced. An interviewer need not travel all over a
town to track down pre-selected respondents. However, if numerous controls are employed in a quota
sample, it will become more expensive though it will have less selection bias.
2) Administratively Convenient: It is administratively convenient. The labour of selecting a random sample
can be avoided by using quota sampling. Also, the problem of non-contacts and call-backs can be dispensed
with altogether.
3) Minimum Memory Errors: When the field work is to be done quickly, perhaps in order to minimize
memory errors, quota sampling is most appropriate and feasible.
4) Independent: It is independent of the existence of sampling frames. Wherever a suitable sampling frame is
not available, quota sampling is perhaps the only choice available.
Disadvantages of Quota Sampling
1) Difficulty in Calculating Standard Errors: Since quota sampling is not based on random selection, it is
not possible to calculate estimates of standard errors for the sample results.
2) Difficulty in Obtaining Representative Sample: It may not be possible to get a representative sample
within the quota as the selection depends entirely on the mood and convenience of the interviewers.
3) Hampers Quality of Work: Since too much latitude is given to the interviewers, the quality of work
suffers if they are not competent.
4) Difficult to Supervise and Control: It may be extremely difficult to supervise and control the field
investigation under quota sampling.

1.2.2.4.

Panel Sampling

Panel sampling is the method of first selecting a group of participants through a random sampling method and
then asking that group for the same information again several times over a period of time. It is a semipermanent sample where members may be included repetitively for successive studies. Here there is a facility to
select and quickly contact such well-balanced samples and to have relatively high response rate even by mail.
Advantages of Panel Sampling
1) Saves Cost and Time: Lesser cost and time involved in the collection of information.
2) Helps in Measuring Changes: Due to fixed sample units, one can measure the changes in repeated
reporting.
3) Helps in Tracing Shift in Behaviour: Shifts of behaviour over time can be traced.
Disadvantages of Panel Sampling
1) Not Representative: The sample under panel sampling may not be fully representative.
2) Members Become Conditioned: The members of the panel may become conditioned to some specific
situations.
3) Difficult to Preserve Representative Character of Panel: It may be difficult to preserve the
representative character of the panel for a long time as the professional members of the panel may drop out
voluntarily and may need replacement.

1.2.2.5.

Snowball Sampling

It is a special non-probability method used when the desired sample characteristic is rare. It may be extremely
difficult or cost prohibitive to locate respondents in these situations. Snowball sampling relies on referrals from
initial subjects to generate additional subjects. While this technique can dramatically lower search costs, it
comes at the expense of introducing bias because the technique itself reduces the likelihood that the sample will
represent a good cross section from the population
Advantages of Snowball Sampling
1) Identifying and Selecting Prospective Respondents: Snowball sampling is a reasonable method of
identifying and selecting prospective respondents who are members of small, hard-to-reach, uniquely
defined target populations.
2) Useful in Qualitative Research: As a non-probability sampling method, it is most useful in qualitative
research practices, like focus group interviews.
3) Needs Little Planning: This sampling technique needs little planning and fewer workforce compared to
other sampling techniques.
4) Less Costly: Reduced sample sizes and costs are primary advantages to this sampling method.
Disadvantages of Snowball Sampling
1) Biased: Snowball sampling definitely allows bias to enter the overall research study, if there are significant
differences between those people who are known within certain social circles and those who are not, there
may be some problems with this sampling technique.
2) Limited Data Structure: Like all other non-probability sampling approaches, data structures are limited
and cannot be used to generalise the results to members of the larger defined target population.
3) Limited Control: The researcher has little control over the sampling method. The subjects that the
researcher can obtain rely mainly on the previous subjects that were observed.
4) Researcher has no Idea of Distribution: Representativeness of the sample is not guaranteed. The
researcher has no idea of the true distribution of the population and of the sample.

1.2.3.

Difference between Probability and Non-probability Sampling

1)
2)

Basis
Control
Chances of Selection
Bias

3)
4)

Economy
Reliability

5)

Suitability

6)

Usefulness

7)

Degree of Accuracy

8)

Sampling Frame

9)

Convenience

Probability Sampling
Sampling error can be controlled.
The selection process depends on the
specific technique and is, therefore, not
influenced by the expertise of the
researcher.
Time and costs involved may be high.
It is possible to test the hypotheses through
formal, rigorous tests of significance and,
thus, obtain more reliable results.
More reliable and representative if the
population is heterogeneous.
Preferable if complex, detailed estimates of
is required.
Accuracy may be poor if the population is
high.
Formal sampling frames required.
May be very inconvenient if the cheaper
geographical spread of the population is
high and likely to have lower.

Non-Probability Sampling
Sampling error cannot be controlled.
Selection bias can be very high.

Usually a low-cost, quicker alternative.


Parametric tests of significance not applicable;
the reliability of results is therefore, not very
high.
May be more useful in a homogeneous
population.
Reasonably useful if parameters to parameters
be estimated are at broad, aggregated levels,
such as market shares or total sales.
Accuracy in such situations is quite scattered.
Can be effective even in the absence of an
elaborate sampling frame.
More convenient, less time-consuming, nonsampling errors.

1.2.4.

Criteria for Selection of Probability and Non- Probability Sampling

The following are some of the considerations for the selection of Probability and Non- probability Sampling:
1) Sometimes random sampling may be difficult to execute in practice. In investigations where non-response
is quite high, the use of random sampling loses its significance and non- probability sampling may be more
appropriate.
2) When the inquiry is to be given, more objectivity and the accuracy of the investigation is to be ensured at
some desired level, and then the only alternative is probability sampling.
3) Since randomness is not involved in non- probability methods, an estimate of sampling error cannot be made.
4) When there are constraints of time, money and availability of appropriate sample frame, then the only
course is non- probability sampling. Especially in situations where more resources are to be devoted for
accurate collection of information then in the selection of the sample units, e.g. consumer preferences, nonprobability sampling is more appropriate.
The overall choice between probability and non-random sampling methods is a difficult exercise. The control
and measurement of sampling error is inherent in random sampling techniques should not be the sole
determinant of sampling procedure. The final choice should depend on some informal judgment. The objectives
of the inquiry, the nature of the universe, resources available at ones disposal may be guiding factors in the
selection of appropriate sampling methodology.

1.3. STATISTICAL INFERENCE


Statistical inference is that branch of statistics which is concerned with using probability concept to deal with
uncertainty in decision-making. The field of statistical inference has had a fruitful development since the latter
half of the 19th century.
In business, there arise several situations when managers have to make quick estimates. Since their estimates
have an impact on the success or failure of their enterprises, they have to take sufficient care to ensure that their
estimates are not far away from the final outcome. The point to note is that such estimates are made without
complete information and with a great deal of uncertainty about the eventual outcome.
In all such situations, it is the theory of probability that forms the basis for statistical inference. The term
statistical inference means making a probability judgment concerning a population on the basis of one or
more samples. Based on probability theory, statistical inferences are made as a basis for making decisions.
For example, an investor is interested to know whether he should subscribe for an investment consultancy
service or not. On the basis of a sample, he has to examine whether the selection of his investment on the advice
of the investment consultancy service has been more profitable than the selection based randomly, he may go in
for this service.
Likewise, a quality control engineer while examining the control chart finds that the production process has
gone out of control. He may then look for the possible sources that have led to this situation. He may then take
corrective measures to restore the production process under control.
Statistical inference treats two different classes of problems:
1) Estimation, i.e., to use the statistics obtained from the sample as estimate of the unknown parameter of
the population from which the sample is drawn.
2) Hypothesis testing, i.e., to test some hypothesis about parent population from which the sample is drawn.
In both these cases the particular problem at hand is structured in such a way that inferences about relevant
population values can be made from sample data.
Statistical Inference

Estimation
Theory

Hypothesis
Testing

1.4. ESTIMATION THEORY


Estimation Theory as the name itself suggests refers to the technique and methods by which population
parameters are estimated from sample studies. Estimation of parameter is absolutely essential whenever a
sample study has been conducted.
For example, a manufacturer would like to have some estimate about the future demand of his product. a
businessman would like to estimate his future sales and profits, a production engineer would very much wish to
know the percentage of defective articles which his machine is likely to produce over a period of time, the
manufacturer of a motor tyres would like to know the approximate life of his tyres, a bulb manufacturer would
be interested to know about the length of life of the bulbs and so on.

1.4.1.

Estimator and Estimate

When one makes an estimate of a population parameter, a sample statistic is used. This sample statistic is an
estimator.
N

For example, the sample mean

i 1

.
N
X is a point estimator of the population mean . Many different Statistics can be used to estimate the same
parameter.
A statistical estimator is a function of the N observed values, X 1, X2,, Xn, sampled from a random variable X.
An estimator is, therefore, also a random variable.
Criteria of Good Estimator
A good estimator must possess the following properties:
1) Unbiasedness: An estimator is unbiased if its value is identical with the real value of the parameter. Consider
as the population mean parameter. An estimator of a population parameter is said to be unbiased if the
expected value of the estimator is equal to the population parameter. That is, is unbiased if E( ) .
For example, the sample mean X is an unbiased estimator. Given a random sample, the expected value of
X is , the same value one is trying to estimate.
2) Consistency: An estimator is said to be consistent, if with an increase in its size, its value (statistic) comes
closer and closer to the parameter value.
For example, if a sample mean X comes closer to the parameter value of the mean , it would be said that
the estimator is consistent.
3) Efficiency: In many cases there can be more than one unbiased and constant estimator of the parameter
value. For example, in a normal distribution both the mean and median are unbiased and consistent
estimators of the parameter mean. However the variance of the sampling distribution of mean would be less
than the variance of the sampling distribution of Median and for this reason Mean would be considered to
be a more efficient estimate than median.
4) Sufficiency: A statistic is said to be a sufficient estimator of the parameter if it contains all the information
in the sample about the parameter. If all the information that a sample can provide about the parameter has
been utilized by an estimator it would be termed as a sufficient estimator.
If there is a sufficient estimator for the parameter, it would also be the most efficient and the most consistent
estimator. It however need not be the most unbiased estimator.

1.4.2.

Type of Estimation

The Theory of Estimation was developed by Prof. R. A. Fisher in 1930 and has been grouped by him in two
classes:
1) Point Estimation
2) Interval Estimation

1.4.3.

Point Estimation

A point estimate is a specific value of a sample statistic that is used to estimate a population parameter. Point
estimation deals with the task of selecting a specific sample value as an estimate for a population parameter.
Point estimation of some population parameter is shown in figure 11.2. The population parameter of interest
might be the mean, variance, standard deviation, proportion or any other characteristic of the population. A
random sample gathered to estimate the value of an unknown population parameter will typically comprise n
observations of the variable of interest. The estimator of the population parameter is some function of these
sample observations.
Typical sample

Population
Parameter
(unknown)

Size n

Sampling

Values
x1, x2,.xn

Estimation

Point estimate
= f(x1, x2..xn)

Point estimate of
Figure 11.2: Point Estimation of a Population Parameter

THIS RED PART CONTAIN 104 WORDS WHICH WILL BE NOT COUNTED FOR PAYMENT
PURPOSE. I.E., NO CHANGES IS REQUIRED FROM YOUR SIDE IN THIS COLOUR PORTION
The point estimators for the population parameters , and 2 are given in table 11.1.
Table 11.1: Point Estimators of , and 2
Population Parameters

Point Estimator
X

Formula for Point Estimate


x

x /n
i 1

Median
25% trimmed mean
10% trimmed mean
P

S2

Middle value in sample (50%-ile)


Mean of middle 50% of values in sample
Mean of middle 80% of values in sample
P = x/n where x = number of successes in n trials
s2

(x1 x ) 2 / n 1
i 1

1.4.3.1.

Properties of Point Estimation

For a statistical point estimate, the sampling distribution of the estimator provides information about the best
estimator.
As different sample statistics can be used as point estimators of different population parameters, the following
general notations will be used in this section:
= population parameter (such as , , p) of interest being estimated
= sample statistic (such as x , s, p ) or point estimator of
Here, (theta) is the Greek letter and is read as theta hat.
The criteria for selecting an estimator are:
1) Unbiasedness: The value of a statistic measured from a given sample is likely to be above or below the
actual value of population parameter of interest due to sampling error. Thus, it is desirable that the mean of
the sampling distribution of sample means taken from a population is equal to the population mean. If it is
true, then the sample statistic is said to be an unbiased estimator of the population parameter.
Hence, the sample statistic is said to be an unbiased estimator of the population parameter, provided E(
) = .
Where, E( ) = Expected value or mean of the sample statistic .

In a sampling distribution of sample mean and sample proportion, we have E( X ) = and E( p ) = p


respectively, therefore both X and p are unbiased estimators of the corresponding population parameters
and p.
2) Consistency: A point estimator is said to be consistent if its value tends to become closer to the
population parameter as the sample size increases. For example, the standard error of sampling
distribution of the mean, x / n, tends to become smaller as sample size n increases. Thus, the sample
mean X is a consistent estimator of the population mean .
Similarly, the sample proportion p is a consistent estimator of the population proportion p because
p / n .
3) Efficiency: Efficiency is a relative term. Efficiency of an estimator is generally defined by comparing it

with another estimator. Let us to take two unbiased estimators 1 and 2 . The estimator 1 is called an

efficient estimator of if the variance of 1 is less than the variance of 2 . Symbolically,

Var (1 ) Var (2 ).
4) Sufficiency: A statistic is said to be a sufficient estimator of the parameter if it contains all the information
in the sample about the parameter. If all the information that a sample can provide about the parameter has
been utilized by an estimator it would be termed as a sufficient estimator. If there is a sufficient estimator
for the parameter, it would also be the most efficient and the most consistent estimator. It however need not
be the most unbiased estimator.

1.4.3.2.

Point Estimation of Population Mean( X )


The most common estimator of the population mean is the sample mean X . Its popularity is in large part
explained by the fact that it combines high efficiency with no bias. In fact, one can show that if X is normal, the
sample mean is more efficient than all other unbiased estimators.
1.4.3.3.

Point Estimation of Population Proportion ()

The recommended estimator of the population proportion is the sample statistic P = X/n where X is the number
of successes found in the sample of n observations. The parameter is one of the two parameters in the
binomial distribution and corresponds to the proportion of successes in the population at large. To estimate ,
one simply calculate the observed number of successes x in a sample on n trials and estimate as x/n.
For example, suppose one is interested in the proportion of residents in a metropolitan area who choose public
transit for their journey to work. A sample of 100 residents finds that 26 took public transit. The best estimate of
is p = 26/100 = 0.26.

1.4.3.4.

Point Estimation of Population Variance ( 2)

For the population variance 2, the suggested estimator is S2. Given a sample, x1, x2, .,xn, the sample value of
this random variable is
n

S2

(x i x) 2
i 1

n 1

Recall that = E(X )2, the mean square departure from the population mean. It therefore makes perfect
sense that estimate of 2 is found by averaging the squared departures from the sample mean.
2

If one takes many samples of size n from a population, the values of S 2 computed for each sample would
average to 2. On the other hand, if one is to divide by n, the estimator would be biased low and tend to
underestimate 2. In estimating 2, the unknown population mean is replaced with the sample mean X . That
is, one is computing squared deviations around the sample mean, not the population mean.

The deviations around x are smaller than the deviations around . Clearly, if the mean deviation around x is
smaller than the mean deviation around , it must underestimate 2. In order to compensate for this bias, one
inflates the sum of squares by dividing by a number smaller than n, namely, n 1. In addition to being
unbiased, S2 is also the most efficient (minimum MSE) unbiased estimator of 2, providing X is normally
distributed.
The estimator for , the population standard deviation, is the square root of S 2 or S. Although it is most
commonly used, it is actually a biased estimator of . However, the degree of bias is very small for reasonable
sample sizes and S is much more convenient than alternative unbiased estimators.

1.4.4.

Interval Estimation

An interval of scores that is established, within which a populations mean (or another parameter) is likely to
fall, when that parameter is being estimated from sample data.
A point estimate cannot be expected to coincide exactly with the population parameter. Suppose in a survey you
find that the average income of a household is `3,00,000 per year. Is it that the income of every household is
`3,00,000 per year? Some households may have more than `3,00,000 and some may have less than this amount.
In other words point estimate will not coincide with the population parameter.
Interval estimation establishes an interval consisting of a lower limit and an upper limit in which the true value
of the population parameter is expected to fall. This interval is called Confidence Interval in the parlance of
inferential statistics.
The meaning of the expression confidence interval is that if you keep on taking repeated samples, the
probability that the true value of the population parameter will fall in this interval is a certain percentage. The
convention is to use a 95% confidence level, and sometimes 99% confidence level.
Suppose you choose 95% as the confidence level. If you
keep on taking repeated samples, the probability that the
true value will fall in this interval is 95%. In other words,
you are 95% confident that the true value of the population
parameter will fall in this interval. The actual establishment
of the confidence interval is built on the sampling
distribution principle. The following figure captures the
meaning of interval estimation:

1.4.4.1.

Confidence interval

Point estimate

Lower confidence
limit

Upper confidence
limit
Figure 11.3: Interval Estimation Diagram

Confidence Interval

A confidence interval (CI) is a particular kind of interval estimate of a population parameter. Instead of
estimating the parameter by a single value, an interval likely to include the parameter is given. Thus, confidence
intervals are used to indicate the reliability of an estimate. How likely the interval is to contain the parameter is
determined by the confidence level or confidence coefficient. Increasing the desired confidence level will
widen the confidence interval.
An interval estimate of a population mean can be developed either by the population standard deviation or
the sample standard deviations to compute the margin of error.
It can be developed through following cases:
1) Confidence Interval for Population Mean (Small Sample)
2) Confidence Interval for Population Mean (Large Sample)
i) Confidence Interval for Population Mean ( Known)
ii) Confidence Interval for Population Mean ( Unknown)

1.4.4.2.

Confidence Interval for Population Mean (Small Sample)

When the population standard deviation is not known and the sample size is small, the procedure of interval
estimation of population mean is based on a probability distribution known as the t-distribution. This

distribution is very similar to the normal distribution. However, the t-distribution has more area in the tails and
less in the center than normal distribution. The t-distribution depends on a parameter known as degrees of
freedom. As the number of degrees of freedom increases, t-distribution gradually approaches the normal
distribution, and the sample standard deviation s becomes a better estimate of population standard deviation .
The interval estimate of a population mean when the sample size is small (n 30) with confidence coefficient
(1 ), is given by:
s
s
s
X t / 2
or X t / 2
X t /2
n
n
n
Where, t/2 is the critical value of t-test statistic providing an area /2 in the right tail of the t-distribution with (n
1) degrees of freedom, and
s

(X X )
i

n 1
The critical values of t for the given degrees of freedom can be obtained from the table of t-distribution.

1.4.4.3.

Confidence Interval for Population Mean (Large Sample)


Confidence Interval for Population
Mean (Large Sample)
Confidence Interval for Population
Mean ( Known)

1.4.4.4.

Confidence Interval for Population


Mean ( Unknown)

Confidence Interval for Population Mean ( Known)

Suppose the population mean is unknown and the true population standard deviation is known. Then for a
large sample size (n 30), the sample mean X is the best point estimator for the population mean . Since,
sampling distribution is approximately normal, it can be used to compute confidence interval of population
mean as follows:

X z / 2 x or X z / 2
n
Confidence
level

/2 = 0.025
/2 = 0.025
X z / 2
or, X z / 2
n
n
Where, z/2 = z-value representing an area /2 in the
right tail of the standard normal probability
distribution
(1 ) = Level of confidence (as shown in figure 11.4)

1 = 95%

z / 2 x

z / 2 x

Estimator , x

Figure 11.4: Sampling Distribution of Mean x

1.4.4.4.1. Confidence Interval for Population Mean ( Unknown)


If the standard deviation of a population is not known, then it can be approximated by the sample standard
deviations, when the sample size is large (n 30). So, the interval estimator of a population mean for a large
sample n (30) with confidence coefficient 1 is given by:
s
X z /2 s x X z /2
n

You might also like