Professional Documents
Culture Documents
What Is Statistical Sampling?: Watch On Television Intend To Vote For Unemployed
What Is Statistical Sampling?: Watch On Television Intend To Vote For Unemployed
What Is Statistical Sampling?: Watch On Television Intend To Vote For Unemployed
These kinds of questions are huge in the sense that they require us to
keep track of millions of individuals. Statistics simplifies these problems by using
a technique called sampling. By conducting a statistical sample, our workload
can be cut down immensely. Rather than tracking the behaviours of billions or
millions, we only need to examine those of thousands or hundreds. As we will
see, this simplification comes at a price.
Samples
As the saying goes, “Well begun is half done.” To ensure that our
statistical studies and experiments have good results, we need to plan and start
them carefully. It’s easy to come up with bad statistical samples. Good simple
random samples require some work to obtain. If our data has been obtained
haphazardly and in a cavalier manner, then no matter how sophisticated our
analysis, statistical techniques will not give us any worthwhile conclusions.
------------------------------------------------------------------------------------------
Sampling (statistics)
From Wikipedia, the free encyclopedia
Population definition
Advantages of sampling
1. Very accurate.
2. Economical in nature.
3. Very reliable.
4. High suitability ratio towards the different surveys.
5. Takes less time.
6. In cases, when the universe is very large, then the sampling method is
the only practical method for collecting the data.
Disadvantages of sampling
1. Inadequacy of the samples.
2. Chances for bias.
3. Problems of accuracy.
4. Difficulty of getting the representative sample.
5. Untrained manpower.
6. Absence of the informants.
7. Chances of committing the errors in sampling.
RANDOM NUMBER TABLE
From Wikipedia, the free encyclopedia
Random number tables have been used in statistics for tasks such as
selected random samples. This was much more effective than manually selecting
the random samples (with dice, cards, etc.). Nowadays, tables of random
numbers have been replaced by computational random number generators.
Note that any published (or otherwise accessible) random data table is
unsuitable for cryptographic purposes since the accessibility of the numbers
makes them effectively predictable, and hence their effect on a cryptosystem is
also predictable. By way of contrast, genuinely random numbers that are only
accessible to the intended encoder and decoder allow literally unbreakable
encryption of a similar or lesser amount of meaningful data (using a simple
exclusive OR operation) in a method known as the one-time pad, which has
often insurmountable problems that are barriers to implementing this method
correctly.
Random sampling
Main article: Random sampling
Random sampling, and its derived terms such as sampling error, imply
specific procedures for gathering and analyzing data that are rigorously applied
as a method for arriving at results considered representative of a given
population as a whole. Despite a common misunderstanding, "random" does not
mean the same thing as "chance" as this idea is often used in describing
situations of uncertainty, nor is it the same as projections based on an assessed
probability or frequency. Sampling always refers to a procedure of gathering
data from a small aggregation of individuals that is purportedly representative of
a larger grouping which must in principle be capable of being measured as a
totality. Random sampling is used precisely to ensure a truly representative
sample from which to draw conclusions, in which the same results would be
arrived at if one had included the entirety of the population instead. Random
sampling (and sampling error) can only be used to gather information about a
single defined point in time. If additional data is gathered (other things
remaining constant) then comparison across time periods may be possible.
However, this comparison is distinct from any sampling itself. As a method for
gathering data within the field of statistics, random sampling is recognized as
clearly distinct from the causal process that one is trying to measure. The
conducting of research itself may lead to certain outcomes affecting the
researched group, but this effect is not what is called sampling error. Sampling
error always refers to the recognized limitations of any supposedly
representative sample population in reflecting the larger totality, and the error
refers only to the discrepancy that may result from judging the whole on the
basis of a much smaller number. This is only an "error" in the sense that it would
automatically be corrected if the totality were itself assessed. The term has no
real meaning outside of statistics.
According to a differing view, a potential example of a sampling error in
evolution is genetic drift; a change is a population’s allele frequencies due to
chance. For example the bottleneck effect; when natural disasters dramatically
reduce the size of a population resulting in a small population that may or may
not fairly represent the original population. What may make the bottleneck effect
a sampling error is that certain alleles, due to natural disaster, are more
common while others may disappear completely, making it a potential sampling
error. Another example of genetic drift that is a potential sampling error is the
founder effect. The founder effect is when a few individuals from a larger
population settle a new isolated area. In this instance, there are only a few
individuals with little gene variety, making it a potential sampling error. [2]
The likely size of the sampling error can generally be controlled by taking
a large enough random sample from the population,[3] although the cost of doing
this may be prohibitive; see sample size and statistical power for more detail. If
the observations are collected from a random sample, statistical theory provides
probabilistic estimates of the likely size of the sampling error for a particular
statistic or estimator. These are often expressed in terms of its standard error.
Bias problems
Non-sampling error
NON-SAMPLING ERROR
From Wikipedia, the free encyclopedia
------------------------------------------------------------------------------------------
SIMPLE RANDOM SAMPLING
In a simple random sample (SRS) of a given size, all such subsets of the
frame are given an equal probability. Furthermore, any given pair of elements
has the same chance of selection as any other such pair (and similarly for
triples, and so on). This minimises bias and simplifies analysis of results. In
particular, the variance between individual results within the sample is a good
indicator of variance in the overall population, which makes it relatively easy to
estimate the accuracy of results.
For example, suppose we wish to sample people from a long street that
starts in a poor area (house No. 1) and ends in an expensive district (house No.
1000). A simple random selection of addresses from this street could easily end
up with too many from the high end and too few from the low end (or vice
versa), leading to an unrepresentative sample. Selecting (e.g.) every 10th street
number along the street ensures that the sample is spread evenly along the
length of the street, representing all of these districts. (Note that if we always
start at house #1 and end at #991, the sample is slightly biased towards the low
end; by randomly selecting the start between #1 and #10, this bias is
eliminated.
STRATIFIED SAMPLING
Main article: Stratified sampling
A visual representation of selecting a random sample using the stratified sampling technique
First, dividing the population into distinct, independent strata can enable
researchers to draw inferences about specific subgroups that may be lost in a
more generalized random sample.
Third, it is sometimes the case that data are more readily available for
individual, pre-existing strata within a population than for the overall population;
in such cases, using a stratified sampling approach may be more convenient
than aggregating data across groups (though this may potentially be at odds
with the previously noted importance of utilizing criterion-relevant strata).
Disadvantages
------------------------------------------------------------------------------------------
CLUSTER SAMPLING
From Wikipedia, the free encyclopedia
A visual representation of selecting a random sample using the cluster sampling
technique
Cluster elements
There also exists multistage sampling, here more than two steps are
taken in selecting clusters from clusters.
Aspects of cluster sampling
One version of cluster sampling is area sampling or geographical
cluster sampling. Clusters consist of geographical areas. Because a
geographically dispersed population can be expensive to survey, greater
economy than simple random sampling can be achieved by treating several
respondents within a local area as a cluster. It is usually necessary to increase
the total sample size to achieve equivalent precision in the estimators, but cost
savings may make that feasible.
In some situations, cluster analysis is only appropriate when the clusters
are approximately the same size. This can be achieved by combining clusters. If
this is not possible, probability proportionate to size sampling is used. In
this method, the probability of selecting any cluster varies with the size of the
cluster, giving larger clusters a greater probability of selection and smaller
clusters a lower probability. However, if clusters are selected with probability
proportionate to size, the same number of interviews should be carried out in
each sampled cluster so that each unit sampled has the same probability of
selection.
Cluster sampling is used to estimate high mortalities in cases such as
wars, famines and natural disasters.[1]
Advantage
- Can be cheaper than other methods – e.g. fewer travel expenses,
administration costs.
- Feasibility: This method takes large populations into account. Since these
groups are so large, deploying any other technique would be very difficult task.
It is feasible only when you are dealing with large population.
- Economy: The regular two major concerns of expenditure, i.e., traveling and
listing, are greatly reduced in this method. For example: Compiling research
information about every house hold in city would be a very difficult, whereas
compiling information about various blocks of the city will be easier. Here,
traveling as well as listing efforts will be greatly reduced.
- Reduced Variability: When you are considering the estimates by any other
method, reduced variability in results are observed. This may not be an ideal
situation every time.
Disadvantage
- Higher sampling error, which can be expressed in the so-called "design effect",
the ratio between the number of subjects in the cluster study and the number of
subjects in an equally reliable, randomly sampled unclustered study. [2]
- Biased Samples: If the group in population that is chosen as a sample has a
biased opinion, then the entire population is inferred to have the same opinion.
This may not be the actual case.
- Errors: The other probabilistic methods give fewer errors than this method. For
this reason, it is discouraged for beginners.
------------------------------------------------------------------------------------------
It also means that one does not need a sampling frame listing all
elements in the target population. Instead, clusters can be chosen from a
cluster-level frame, with an element-level frame created only for the selected
clusters. In the example above, the sample only requires a block-level city map
for initial selections, and then a household-level map of the 100 selected blocks,
rather than a household-level map of the whole city.
Example: Suppose we have six schools with populations of 150, 180, 200, 220,
260, and 490 students respectively (total 1500 students), and we want to use
student population as the basis for a PPS sample of size three. To do this, we
could allocate the first school numbers 1 to 150, the second school 151 to
330 (= 150 + 180), the third school 331 to 530, and so on to the last school
(1011 to 1500). We then generate a random start between 1 and 500 (equal
to 1500/3) and count through the school populations by multiples of 500. If our
random start was 137, we would select the schools which have been allocated
numbers 137, 637, and 1137, i.e. the first, fourth, and sixth schools.
The PPS approach can improve accuracy for a given sample size by
concentrating sample on large elements that have the greatest impact on
population estimates. PPS sampling is commonly used for surveys of businesses,
where element size varies greatly and auxiliary information is often available—
for instance, a survey attempting to measure the number of guest-nights spent
in hotels might use each hotel's number of rooms as an auxiliary variable. In
some cases, an older measurement of the variable of interest can be used as an
auxiliary variable when attempting to produce more current estimates. [6]
------------------------------------------------------------------------------------------
MULTISTAGE SAMPLING
From Wikipedia, the free encyclopedia
Advantages
- Cost and speed that the survey can be done in
- Convenience of finding the survey sample
- Normally more accurate than cluster sampling for the same size sample
Disadvantages
- Not as accurate as Simple Random Sample[ambiguous] if the sample is the same
size
- More testing is difficult to do
------------------------------------------------------------------------------------------
WHAT IS THE DIFFERENCE BETWEEN THE ATTRIBUTE AND
VARIABLE SAMPLING PLANS?
Sampling is a complex subject. It can be divided into analytic and enumerative.
In general, analytic sampling tries to predict what is going to happen (will the
process stay the same, for example), and enumerative sampling tries to
determine something about an existing population (what is the percent bad in
the shipment just received, for example). You need to determine the purpose for
taking the sample and what information will serve to accomplish the aim.
The main difference in taking the samples is that for a variable sample,
measurements of a characteristic of interest are taken, and for an attribute
sample, one counts the number of units having or not having specific properties
(mostly good/bad or number of flaws). Generally, attribute samples are much
larger than variable samples and to be useful, need to be very large, when the
proportion of bad units (or flaws) is very small.
------------------------------------------------------------------------------------------
Attribute plans are generally easier to use than variables plans. A sample
of n units is selected randomly from a lot of N units. If there are c or fewer
defectives, accept the lot. If there are more than c defectives, reject the lot. For
example, suppose you have a shipment of 10,000 bolts. You will inspect 89 of
them. If there are 0, 1, or 2 defective bolts, then you may accept the shipment.
If there are more than 2 defectives, then reject the entire lot of bolts.
For variables sampling plans, you can only examine one measurement per
sampling plan. For example, if you need to inspect for wafer thickness and wafer
width, you need two separate sampling plans. Variables sampling plans assume
that the distribution of the quality characteristic is normal. However, the main
benefit from using variables data is that a variables sampling plan requires a
much smaller sample size than an attributes sampling plan.
------------------------------------------------------------------------------------------
DISCOVERY SAMPLING
A method of sampling to assess whether the percentage error is not in
excess of a specified percentage of the population. The sampling considers the
population size, minimum unacceptable error rate and the confidence level. If
the sample does not have any errors then the actual error rate is below the
minimum unacceptable rate.
------------------------------------------------------------------------------------------