What Is Statistical Sampling?: Watch On Television Intend To Vote For Unemployed

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 18

WHAT IS STATISTICAL SAMPLING?

Many times researchers want to know the answers to big sorts of


questions. These questions may or may not be profound, but they are large in
their scope. What did everyone in a particular country watch on television last
night? Who does an electorate intend to vote for in an upcoming election? How
many birds return from migration at a certain location? What percentage of the
workforce is unemployed?

These kinds of questions are huge in the sense that they require us to
keep track of millions of individuals. Statistics simplifies these problems by using
a technique called sampling. By conducting a statistical sample, our workload
can be cut down immensely. Rather than tracking the behaviours of billions or
millions, we only need to examine those of thousands or hundreds. As we will
see, this simplification comes at a price.

Populations and Censuses

The population of a statistical study is what we’re trying to find out


something about. It consists of all of the individuals who are being examined. A
population can really be anything. Californians, caribous, computers, cars or
counties could all be considered populations, depending on the statistical
question. Although most populations being researched are large, they do not
necessarily have to be.

Samples

Since it’s normally either impossible or impractical to track down every


member of a population, the next option available is to sample the population. A
sample is any subset of a population, so its size can be small or large. We want
a sample small enough to be manageable by our computing power, yet large
enough to give us statistically significant results.

Some Words of Advice

As the saying goes, “Well begun is half done.” To ensure that our
statistical studies and experiments have good results, we need to plan and start
them carefully. It’s easy to come up with bad statistical samples. Good simple
random samples require some work to obtain. If our data has been obtained
haphazardly and in a cavalier manner, then no matter how sophisticated our
analysis, statistical techniques will not give us any worthwhile conclusions.
------------------------------------------------------------------------------------------
Sampling (statistics)
From Wikipedia, the free encyclopedia

A visual representation of the sampling process.

In statistics, quality assurance, and survey methodology, sampling is


concerned with the selection of a subset of individuals from within a statistical
population to estimate characteristics of the whole population. Each observation
measures one or more properties (such as weight, location, color) of observable
bodies distinguished as independent objects or individuals. In survey sampling,
weights can be applied to the data to adjust for the sample design, particularly
stratified sampling. Results from probability theory and statistical theory are
employed to guide practice. In business and medical research, sampling is
widely used for gathering information about a population.

The sampling process comprises several stages:


- Defining the population of concern.
- Specifying a sampling frame, a set of items or events possible to
measure.
- Specifying a sampling method for selecting items or events from
the frame.
- Determining the sample size.
- Implementing the sampling plan.
- Sampling and data collecting.
- Data which can be selected.

Population definition

Successful statistical practice is based on focused problem definition. In


sampling, this includes defining the population from which our sample is drawn.
A population can be defined as including all people or items with the
characteristic one wishes to understand. Because there is very rarely enough
time or money to gather information from everyone or everything in a
population, the goal becomes finding a representative sample (or subset) of that
population.

Sometimes what defines a population is obvious. For example, a


manufacturer needs to decide whether a batch of material from production is of
high enough quality to be released to the customer, or should be sentenced for
scrap or rework due to poor quality. In this case, the batch is the population.
WHAT IS SAMPLING? WHAT ARE ITS CHARACTERISTICS,
ADVANTAGES AND DISADVANTAGES?

Introduction and Meaning

In the Research Methodology, practical formulation of the research is very


much important and so should be done very carefully with proper concentration
and in the presence of a very good guidance. But during the formulation of the
research on the practical grounds, one tends to go through a large number of
problems. These problems are generally related to the knowing of the features
of the universe or the population on the basis of studying the characteristics of
the specific part or some portion, generally called as the sample.
So now sampling can be defined as the method or the technique
consisting of selection for the study of the so called part or the portion or the
sample, with a view to draw conclusions or the solutions about the universe or
the population.

Basic Principles of Sampling

Theory of sampling is based on the following laws-


• Law of Statistical Regularity – This law comes from the mathematical
theory of probability. According to King,” Law of Statistical Regularity says that a
moderately large number of the items chosen at random from the large group
are almost sure on the average to possess the features of the large group.”
According to this law the units of the sample must be selected at random.
• Law of Inertia of Large Numbers – According to this law, the other things
being equal – the larger the size of the sample; the more accurate the results
are likely to be.

Characteristics of the sampling technique


1. Much cheaper.
2. Saves time.
3. Much reliable.
4. Very suitable for carrying out different surveys.
5. Scientific in nature.

Advantages of sampling
1. Very accurate.
2. Economical in nature.
3. Very reliable.
4. High suitability ratio towards the different surveys.
5. Takes less time.
6. In cases, when the universe is very large, then the sampling method is
the only practical method for collecting the data.

Disadvantages of sampling
1. Inadequacy of the samples.
2. Chances for bias.
3. Problems of accuracy.
4. Difficulty of getting the representative sample.
5. Untrained manpower.
6. Absence of the informants.
7. Chances of committing the errors in sampling.
RANDOM NUMBER TABLE
From Wikipedia, the free encyclopedia

Random number tables have been used in statistics for tasks such as
selected random samples. This was much more effective than manually selecting
the random samples (with dice, cards, etc.). Nowadays, tables of random
numbers have been replaced by computational random number generators.

If carefully prepared, the filtering and testing processes remove any


noticeable bias or asymmetry from the hardware-generated original numbers so
that such tables provide the most "reliable" random numbers available to the
casual user.

Note that any published (or otherwise accessible) random data table is
unsuitable for cryptographic purposes since the accessibility of the numbers
makes them effectively predictable, and hence their effect on a cryptosystem is
also predictable. By way of contrast, genuinely random numbers that are only
accessible to the intended encoder and decoder allow literally unbreakable
encryption of a similar or lesser amount of meaningful data (using a simple
exclusive OR operation) in a method known as the one-time pad, which has
often insurmountable problems that are barriers to implementing this method
correctly.

The first "testing" of random numbers for statistical randomness was


developed by M.G. Kendall and B. Babington Smith in the late 1930s, and was
based upon looking for certain types of probabilistic expectations in a given
sequence. The simplest test looked to make sure that roughly equal numbers of
1s, 2s, 3s, etc. were present; more complicated tests looked for the number of
digits between successive 0s and compared the total counts with their expected
probabilities. Over the years more complicated tests were developed. Kendall
and Smith also created the notion of "local randomness", whereby a given set of
random numbers would be broken down and tested in segments. In their set of
100,000 numbers, for example, two of the thousands were somewhat less
"locally random" than the rest, but the set as a whole would pass its tests.
Kendall and Smith advised their readers not to use those particular thousands by
themselves as a consequence.
------------------------------------------------------------------------------------------
SAMPLING ERROR
From Wikipedia, the free encyclopedia

In statistics, sampling error is incurred when the statistical


characteristics of a population are estimated from a subset, or sample, of that
population. Since the sample does not include all members of the population,
statistics on the sample, such as means and quantiles, generally differ from
parameters on the entire population. For example, if one measures the height of
a thousand individuals from a country of one million, the average height of the
thousand is typically not the same as the average height of all one million people
in the country. Since sampling is typically done to determine the characteristics
of a whole population, the difference between the sample and population values
is considered a sampling error.[1] Exact measurement of sampling error is
generally not feasible since the true population values are unknown; however,
sampling error can often be estimated by probabilistic modeling of the sample.

Random sampling
Main article: Random sampling

In statistics, sampling error is the error caused by observing a sample


instead of the whole population.[1] The sampling error is the difference between a
sample statistic used to estimate a population parameter and the actual but
unknown value of the parameter (Bunns & Grove, 2009). An estimate of a
quantity of interest, such as an average or percentage, will generally be subject
to sample-to-sample variation.[1] These variations in the possible sample values
of a statistic can theoretically be expressed as sampling errors, although in
practice the exact sampling error is typically unknown. Sampling error also
refers more broadly to this phenomenon of random sampling variation.

Random sampling, and its derived terms such as sampling error, imply
specific procedures for gathering and analyzing data that are rigorously applied
as a method for arriving at results considered representative of a given
population as a whole. Despite a common misunderstanding, "random" does not
mean the same thing as "chance" as this idea is often used in describing
situations of uncertainty, nor is it the same as projections based on an assessed
probability or frequency. Sampling always refers to a procedure of gathering
data from a small aggregation of individuals that is purportedly representative of
a larger grouping which must in principle be capable of being measured as a
totality. Random sampling is used precisely to ensure a truly representative
sample from which to draw conclusions, in which the same results would be
arrived at if one had included the entirety of the population instead. Random
sampling (and sampling error) can only be used to gather information about a
single defined point in time. If additional data is gathered (other things
remaining constant) then comparison across time periods may be possible.
However, this comparison is distinct from any sampling itself. As a method for
gathering data within the field of statistics, random sampling is recognized as
clearly distinct from the causal process that one is trying to measure. The
conducting of research itself may lead to certain outcomes affecting the
researched group, but this effect is not what is called sampling error. Sampling
error always refers to the recognized limitations of any supposedly
representative sample population in reflecting the larger totality, and the error
refers only to the discrepancy that may result from judging the whole on the
basis of a much smaller number. This is only an "error" in the sense that it would
automatically be corrected if the totality were itself assessed. The term has no
real meaning outside of statistics.
According to a differing view, a potential example of a sampling error in
evolution is genetic drift; a change is a population’s allele frequencies due to
chance. For example the bottleneck effect; when natural disasters dramatically
reduce the size of a population resulting in a small population that may or may
not fairly represent the original population. What may make the bottleneck effect
a sampling error is that certain alleles, due to natural disaster, are more
common while others may disappear completely, making it a potential sampling
error. Another example of genetic drift that is a potential sampling error is the
founder effect. The founder effect is when a few individuals from a larger
population settle a new isolated area. In this instance, there are only a few
individuals with little gene variety, making it a potential sampling error. [2]

The likely size of the sampling error can generally be controlled by taking
a large enough random sample from the population,[3] although the cost of doing
this may be prohibitive; see sample size and statistical power for more detail. If
the observations are collected from a random sample, statistical theory provides
probabilistic estimates of the likely size of the sampling error for a particular
statistic or estimator. These are often expressed in terms of its standard error.

Bias problems

Sampling bias is a possible source of sampling errors. It leads to sampling


errors which either have a prevalence to be positive or negative. Such errors can
be considered to be systematic errors.

Non-sampling error

Sampling error can be contrasted with non-sampling error. Non-sampling


error is a catch-all term for the deviations from the true value that are not a
function of the sample chosen, including various systematic errors and any
random errors that are not due to sampling. Non-sampling errors are much
harder to quantify than sampling error.[3]
------------------------------------------------------------------------------------------

NON-SAMPLING ERROR
From Wikipedia, the free encyclopedia

In statistics, non-sampling error is a catch-all term for the deviations of


estimates from their true values that are not a function of the sample chosen,
including various systematic errors and random errors that are not due to
sampling.[1] Non-sampling errors are much harder to quantify than sampling
errors.[2]

Non-sampling errors in survey estimates can arise from:[3]

- Coverage errors, such as failure to accurately represent all population


units in the sample, or the inability to obtain information about all sample
cases;
- Response errors by respondents due for example to definitional
differences, misunderstandings, or deliberate misreporting;
- Mistakes in recording the data or coding it to standard classifications;
- Other errors of collection, nonresponse, processing, or imputation of
values for missing or inconsistent data.[3]
------------------------------------------------------------------------------------------
Table summarising types of error.

------------------------------------------------------------------------------------------
SIMPLE RANDOM SAMPLING

A visual representation of selecting a simple random sample .

In a simple random sample (SRS) of a given size, all such subsets of the
frame are given an equal probability. Furthermore, any given pair of elements
has the same chance of selection as any other such pair (and similarly for
triples, and so on). This minimises bias and simplifies analysis of results. In
particular, the variance between individual results within the sample is a good
indicator of variance in the overall population, which makes it relatively easy to
estimate the accuracy of results.

However, SRS can be vulnerable to sampling error because the


randomness of the selection may result in a sample that doesn't reflect the
makeup of the population. For instance, a simple random sample of ten people
from a given country will on average produce five men and five women, but any
given trial is likely to overrepresent one sex and underrepresent the other.
(Systematic and stratified techniques), attempt to overcome this problem by
"using information about the population" to choose a more "representative"
sample.

SRS may also be cumbersome and tedious when sampling from an


unusually large target population. In some cases, investigators are interested in
"research questions specific" to subgroups of the population. For example,
researchers might be interested in examining whether cognitive ability as a
predictor of job performance is equally applicable across racial groups. SRS
cannot accommodate the needs of researchers in this situation because it does
not provide subsamples of the population. "Stratified sampling" addresses this
weakness of SRS.
------------------------------------------------------------------------------------------
SYSTEMATIC SAMPLING
Main article: Systematic sampling

A visual representation of selecting a random sample using the systematic


sampling technique

Systematic sampling relies on arranging the study population according to


some ordering scheme and then selecting elements at regular intervals through
that ordered list. Systematic sampling involves a random start and then
proceeds with the selection of every kth element from then onwards. In this
case, k=(population size/sample size). It is important that the starting point is
not automatically the first in the list, but is instead randomly chosen from within
the first to the kth element in the list. A simple example would be to select every
10th name from the telephone directory (an 'every 10th' sample, also referred to
as 'sampling with a skip of 10').

As long as the starting point is randomized, systematic sampling is a type


of probability sampling. It is easy to implement and the stratification induced can
make it efficient, if the variable by which the list is ordered is correlated with the
variable of interest. 'Every 10th' sampling is especially useful for efficient
sampling from databases.

For example, suppose we wish to sample people from a long street that
starts in a poor area (house No. 1) and ends in an expensive district (house No.
1000). A simple random selection of addresses from this street could easily end
up with too many from the high end and too few from the low end (or vice
versa), leading to an unrepresentative sample. Selecting (e.g.) every 10th street
number along the street ensures that the sample is spread evenly along the
length of the street, representing all of these districts. (Note that if we always
start at house #1 and end at #991, the sample is slightly biased towards the low
end; by randomly selecting the start between #1 and #10, this bias is
eliminated.

However, systematic sampling is especially vulnerable to periodicities in


the list. If periodicity is present and the period is a multiple or factor of the
interval used, the sample is especially likely to be unrepresentative of the overall
population, making the scheme less accurate than simple random sampling.
For example, consider a street where the odd-numbered houses are all on the
north (expensive) side of the road, and the even-numbered houses are all on the
south (cheap) side. Under the sampling scheme given above, it is impossible to
get a representative sample; either the houses sampled will all be from the odd-
numbered, expensive side, or they will all be from the even-numbered, cheap
side, unless the researcher has previous knowledge of this bias and avoids it by
a using a skip which ensures jumping between the two sides (any odd-numbered
skip).

Another drawback of systematic sampling is that even in scenarios where


it is more accurate than SRS, its theoretical properties make it difficult to
quantify that accuracy. (In the two examples of systematic sampling that are
given above, much of the potential sampling error is due to variation between
neighbouring houses – but because this method never selects two neighbouring
houses, the sample will not give us any information on that variation.)

As described above, systematic sampling is an EPS method, because all


elements have the same probability of selection (in the example given, one in
ten). It is not 'simple random sampling' because different subsets of the same
size have different selection probabilities – e.g. the set {4,14,24,...,994} has a
one-in-ten probability of selection, but the set {4,13,24,34,...} has zero
probability of selection.

Systematic sampling can also be adapted to a non-EPS approach; for an


example, see discussion of PPS samples below.
------------------------------------------------------------------------------------------

STRATIFIED SAMPLING
Main article: Stratified sampling

A visual representation of selecting a random sample using the stratified sampling technique

Where the population embraces a number of distinct categories, the frame


can be organized by these categories into separate "strata." Each stratum is
then sampled as an independent sub-population, out of which individual
elements can be randomly selected. [1] There are several potential benefits to
stratified sampling.

First, dividing the population into distinct, independent strata can enable
researchers to draw inferences about specific subgroups that may be lost in a
more generalized random sample.

Second, utilizing a stratified sampling method can lead to more efficient


statistical estimates (provided that strata are selected based upon relevance to
the criterion in question, instead of availability of the samples). Even if a
stratified sampling approach does not lead to increased statistical efficiency,
such a tactic will not result in less efficiency than would simple random
sampling, provided that each stratum is proportional to the group's size in the
population.

Third, it is sometimes the case that data are more readily available for
individual, pre-existing strata within a population than for the overall population;
in such cases, using a stratified sampling approach may be more convenient
than aggregating data across groups (though this may potentially be at odds
with the previously noted importance of utilizing criterion-relevant strata).

Finally, since each stratum is treated as an independent population,


different sampling approaches can be applied to different strata, potentially
enabling researchers to use the approach best suited (or most cost-effective) for
each identified subgroup within the population.

There are, however, some potential drawbacks to using stratified


sampling. First, identifying strata and implementing such an approach can
increase the cost and complexity of sample selection, as well as leading to
increased complexity of population estimates. Second, when examining multiple
criteria, stratifying variables may be related to some, but not to others, further
complicating the design, and potentially reducing the utility of the strata. Finally,
in some cases (such as designs with a large number of strata, or those with a
specified minimum sample size per group), stratified sampling can potentially
require a larger sample than would other methods (although in most cases, the
required sample size would be no larger than would be required for simple
random sampling.

A stratified sampling approach is most effective when three conditions are


met:
1. Variability within strata are minimized.
2. Variability between strata are maximized.
3. The variables upon which the population is stratified are strongly
correlated with the desired dependent variable.

Advantages over other sampling methods

1. Focuses on important subpopulations and ignores irrelevant ones.


2. Allows use of different sampling techniques for different subpopulations.
3. Improves the accuracy/efficiency of estimation.
4. Permits greater balancing of statistical power of tests of differences
between strata by sampling equal numbers from strata varying widely in
size.

Disadvantages

1. Requires selection of relevant stratification variables which can be difficult.


2. Is not useful when there are no homogeneous subgroups.
3. Can be expensive to implement.

------------------------------------------------------------------------------------------

CLUSTER SAMPLING
From Wikipedia, the free encyclopedia
A visual representation of selecting a random sample using the cluster sampling
technique

Cluster sampling is a sampling technique used when "natural" but


relatively homogeneous groupings are evident in a statistical population. It is
often used in marketing research. In this technique, the total population is
divided into these groups (or clusters) and a simple random sample of the
groups is selected. Then the required information is collected from a simple
random sample of the elements within each selected group. This may be done
for every element in these groups or a subsample of elements may be selected
within each of these groups. A common motivation for cluster sampling is to
reduce the total number of interviews and costs given the desired accuracy.
Assuming a fixed sample size, the technique gives more accurate results when
most of the variation in the population is within the groups, not between them.

Cluster elements

The population within a cluster should ideally be as heterogeneous as


possible, but there should be homogeneity between cluster means. Each cluster
should be a small-scale representation of the total population. The clusters
should be mutually exclusive and collectively exhaustive. A random sampling
technique is then used on any relevant clusters to choose which clusters to
include in the study. In single-stage cluster sampling, all the elements from each
of the selected clusters are used. In two-stage cluster sampling, a random
sampling technique is applied to the elements from each of the selected
clusters...

The main difference between cluster sampling and stratified sampling is


that in cluster sampling the cluster is treated as the sampling unit so analysis is
done on a population of clusters (at least in the first stage). In stratified
sampling, the analysis is done on elements within strata. In stratified sampling,
a random sample is drawn from each of the strata, whereas in cluster sampling
only the selected clusters are studied. The main objective of cluster sampling is
to reduce costs by increasing sampling efficiency. This contrasts with stratified
sampling where the main objective is to increase precision.

There also exists multistage sampling, here more than two steps are
taken in selecting clusters from clusters.
Aspects of cluster sampling
One version of cluster sampling is area sampling or geographical
cluster sampling. Clusters consist of geographical areas. Because a
geographically dispersed population can be expensive to survey, greater
economy than simple random sampling can be achieved by treating several
respondents within a local area as a cluster. It is usually necessary to increase
the total sample size to achieve equivalent precision in the estimators, but cost
savings may make that feasible.
In some situations, cluster analysis is only appropriate when the clusters
are approximately the same size. This can be achieved by combining clusters. If
this is not possible, probability proportionate to size sampling is used. In
this method, the probability of selecting any cluster varies with the size of the
cluster, giving larger clusters a greater probability of selection and smaller
clusters a lower probability. However, if clusters are selected with probability
proportionate to size, the same number of interviews should be carried out in
each sampled cluster so that each unit sampled has the same probability of
selection.
Cluster sampling is used to estimate high mortalities in cases such as
wars, famines and natural disasters.[1]

Advantage
- Can be cheaper than other methods – e.g. fewer travel expenses,
administration costs.
- Feasibility: This method takes large populations into account. Since these
groups are so large, deploying any other technique would be very difficult task.
It is feasible only when you are dealing with large population.
- Economy: The regular two major concerns of expenditure, i.e., traveling and
listing, are greatly reduced in this method. For example: Compiling research
information about every house hold in city would be a very difficult, whereas
compiling information about various blocks of the city will be easier. Here,
traveling as well as listing efforts will be greatly reduced.
- Reduced Variability: When you are considering the estimates by any other
method, reduced variability in results are observed. This may not be an ideal
situation every time.

Disadvantage
- Higher sampling error, which can be expressed in the so-called "design effect",
the ratio between the number of subjects in the cluster study and the number of
subjects in an equally reliable, randomly sampled unclustered study. [2]
- Biased Samples: If the group in population that is chosen as a sample has a
biased opinion, then the entire population is inferred to have the same opinion.
This may not be the actual case.
- Errors: The other probabilistic methods give fewer errors than this method. For
this reason, it is discouraged for beginners.
------------------------------------------------------------------------------------------

MORE ON CLUSTER SAMPLING


Two-stage cluster sampling
Two-stage cluster sampling, a simple case of multistage sampling, is
obtained by selecting cluster samples in the first stage and then selecting
sample of elements from every sampled cluster. Consider a population of N
clusters in total. In the first stage, n clusters are selected using ordinary cluster
sampling method. In the second stage, simple random sampling is usually used.
[3]
It is used separately in every cluster and the numbers of elements selected
from different clusters are not necessarily equal. The total number of clusters N,
number of clusters selected n, and numbers of elements from selected clusters
need to be pre-determined by the survey designer. Two-stage cluster sampling
aims at minimizing survey costs and at the same time controlling the uncertainty
related to estimates of interest.[4] This method can be used in health and social
sciences. For instance, researchers used two-stage cluster sampling to generate
a representative sample of the Iraqi population to conduct mortality surveys. [5]
Sampling in this method can be quicker and more reliable than other methods,
which is why this method is now used frequently.
------------------------------------------------------------------------------------------

Sometimes it is more cost-effective to select respondents in groups


('clusters'). Sampling is often clustered by geography, or by time periods.
(Nearly all samples are in some sense 'clustered' in time – although this is rarely
taken into account in the analysis.) For instance, if surveying households within
a city, we might choose to select 100 city blocks and then interview every
household within the selected blocks.

Clustering can reduce travel and administrative costs. In the example


above, an interviewer can make a single trip to visit several households in one
block, rather than having to drive to a different block for each household.

It also means that one does not need a sampling frame listing all
elements in the target population. Instead, clusters can be chosen from a
cluster-level frame, with an element-level frame created only for the selected
clusters. In the example above, the sample only requires a block-level city map
for initial selections, and then a household-level map of the 100 selected blocks,
rather than a household-level map of the whole city.

Cluster sampling generally increases the variability of sample estimates


above that of simple random sampling, depending on how the clusters differ
between themselves, as compared with the within-cluster variation. For this
reason, cluster sampling requires a larger sample than SRS to achieve the same
level of accuracy – but cost savings from clustering might still make this a
cheaper option.

Cluster sampling is commonly implemented as multistage sampling. This


is a complex form of cluster sampling in which two or more levels of units are
embedded one in the other. The first stage consists of constructing the clusters
that will be used to sample from. In the second stage, a sample of primary units
is randomly selected from each cluster (rather than using all units contained in
all selected clusters). In following stages, in each of those selected clusters,
additional samples of units are selected, and so on. All ultimate units
(individuals, for instance) selected at the last step of this procedure are then
surveyed. This technique, thus, is essentially the process of taking random
subsamples of preceding random samples.
Multistage sampling can substantially reduce sampling costs, where the
complete population list would need to be constructed (before other sampling
methods could be applied). By eliminating the work involved in describing
clusters that are not selected, multistage sampling can reduce the large costs
associated with traditional cluster sampling.[6] However, each sample may not be
a full representative of the whole population.
------------------------------------------------------------------------------------------
PROBABILITY-PROPORTIONAL-TO-SIZE SAMPLING

In some cases the sample designer has access to an "auxiliary variable" or


"size measure", believed to be correlated to the variable of interest, for each
element in the population. These data can be used to improve accuracy in
sample design. One option is to use the auxiliary variable as a basis for
stratification, as discussed above.

Another option is probability proportional to size ('PPS') sampling, in which


the selection probability for each element is set to be proportional to its size
measure, up to a maximum of 1. In a simple PPS design, these selection
probabilities can then be used as the basis for Poisson sampling. However, this
has the drawback of variable sample size, and different portions of the
population may still be over- or under-represented due to chance variation in
selections.

Systematic sampling theory can be used to create a probability


proportionate to size sample. This is done by treating each count within the size
variable as a single sampling unit. Samples are then identified by selecting at
even intervals among these counts within the size variable. This method is
sometimes called PPS-sequential or monetary unit sampling in the case of audits
or forensic sampling.

Example: Suppose we have six schools with populations of 150, 180, 200, 220,
260, and 490 students respectively (total 1500 students), and we want to use
student population as the basis for a PPS sample of size three. To do this, we
could allocate the first school numbers 1 to 150, the second school 151 to
330 (= 150 + 180), the third school 331 to 530, and so on to the last school
(1011 to 1500). We then generate a random start between 1 and 500 (equal
to 1500/3) and count through the school populations by multiples of 500. If our
random start was 137, we would select the schools which have been allocated
numbers 137, 637, and 1137, i.e. the first, fourth, and sixth schools.

The PPS approach can improve accuracy for a given sample size by
concentrating sample on large elements that have the greatest impact on
population estimates. PPS sampling is commonly used for surveys of businesses,
where element size varies greatly and auxiliary information is often available—
for instance, a survey attempting to measure the number of guest-nights spent
in hotels might use each hotel's number of rooms as an auxiliary variable. In
some cases, an older measurement of the variable of interest can be used as an
auxiliary variable when attempting to produce more current estimates. [6]
------------------------------------------------------------------------------------------
MULTISTAGE SAMPLING
From Wikipedia, the free encyclopedia

Multistage sampling refers to sampling plans where the sampling is


carried out in stages using smaller and smaller sampling units at each stage. [1]
Multistage sampling can be a complex form of cluster sampling... Cluster
sampling is a type of sampling which involves dividing the population into groups
(or clusters). Then, one or more clusters are chosen at random and everyone
within the chosen cluster is sampled.
Using all the sample elements in all the selected clusters may be
prohibitively expensive or unnecessary. Under these circumstances, multistage
cluster sampling becomes useful. Instead of using all the elements contained in
the selected clusters, the researcher randomly selects elements from each
cluster. Constructing the clusters is the first stage. Deciding what elements
within the cluster to use is the second stage. The technique is used frequently
when a complete list of all members of the population does not exist and is
inappropriate.
In some cases, several levels of cluster selection may be applied before
the final sample elements are reached. For example, household surveys
conducted by the Australian Bureau of Statistics begin by dividing metropolitan
regions into 'collection districts' and selecting some of these collection districts
(first stage). The selected collection districts are then divided into blocks, and
blocks are chosen from within each selected collection district (second stage).
Next, dwellings are listed within each selected block, and some of these
dwellings are selected (third stage). This method makes it unnecessary to create
a list of every dwelling in the region and necessary only for selected blocks. In
remote areas, an additional stage of clustering is used, in order to reduce travel
requirements.[2]
Although cluster sampling and stratified sampling bear some superficial
similarities, they are substantially different. In stratified sampling, a random
sample is drawn from all the strata, where in cluster sampling only the selected
clusters are studied, either in single- or multi-stage.

Advantages
- Cost and speed that the survey can be done in
- Convenience of finding the survey sample
- Normally more accurate than cluster sampling for the same size sample

Disadvantages
- Not as accurate as Simple Random Sample[ambiguous] if the sample is the same
size
- More testing is difficult to do
------------------------------------------------------------------------------------------
WHAT IS THE DIFFERENCE BETWEEN THE ATTRIBUTE AND
VARIABLE SAMPLING PLANS?
Sampling is a complex subject. It can be divided into analytic and enumerative.
In general, analytic sampling tries to predict what is going to happen (will the
process stay the same, for example), and enumerative sampling tries to
determine something about an existing population (what is the percent bad in
the shipment just received, for example). You need to determine the purpose for
taking the sample and what information will serve to accomplish the aim.

The main difference in taking the samples is that for a variable sample,
measurements of a characteristic of interest are taken, and for an attribute
sample, one counts the number of units having or not having specific properties
(mostly good/bad or number of flaws). Generally, attribute samples are much
larger than variable samples and to be useful, need to be very large, when the
proportion of bad units (or flaws) is very small.

------------------------------------------------------------------------------------------

Comparison of attribute acceptance sampling plans and variables


acceptance sampling plans:

Attribute plans are generally easier to use than variables plans. A sample
of n units is selected randomly from a lot of N units. If there are c or fewer
defectives, accept the lot. If there are more than c defectives, reject the lot. For
example, suppose you have a shipment of 10,000 bolts. You will inspect 89 of
them. If there are 0, 1, or 2 defective bolts, then you may accept the shipment.
If there are more than 2 defectives, then reject the entire lot of bolts.

For variables sampling plans, you can only examine one measurement per
sampling plan. For example, if you need to inspect for wafer thickness and wafer
width, you need two separate sampling plans. Variables sampling plans assume
that the distribution of the quality characteristic is normal. However, the main
benefit from using variables data is that a variables sampling plan requires a
much smaller sample size than an attributes sampling plan.
------------------------------------------------------------------------------------------

DISCOVERY SAMPLING
A method of sampling to assess whether the percentage error is not in
excess of a specified percentage of the population. The sampling considers the
population size, minimum unacceptable error rate and the confidence level. If
the sample does not have any errors then the actual error rate is below the
minimum unacceptable rate.

Discovery sampling is a sampling plan which selects a sample of a given


size, accepts the population if the sample is error free, and rejects the
population if it contains at least one error. With discovery sampling the auditor
may not be interested in determining how many errors there are in the
population. Where there is a possibility of avoidance of the internal control
system, it may be sufficient to disclose one example to precipitate further action
or investigation.
------------------------------------------------------------------------------------------

HOW DOES MONETARY UNIT SAMPLING WORK?

Auditors use monetary unit sampling, also called probability-proportional-


to-size or dollar-unit sampling, to determine the accuracy of financial accounts.
With monetary unit sampling, each dollar in a transaction is a separate sampling
unit. A transaction for $40, for example, contains 40 sampling units. Auditors
usually use monetary unit sampling to sample and test accounts receivable,
loans receivable, and inventory.

------------------------------------------------------------------------------------------

You might also like