Sampling and Sampling Distributions

Sampling frame
Sampling frame: list, map or directory from which to take a sample to represent the population.
- Frames that have over-registration contain all the target population units plus some additional
- Frames that have under-registration contain fewer units than in the target population.
- Sampling is done from the frame, not the target population.
- In theory, the target population and the frame are the same, but in reality, a researcher’s goal is to
minimise the differences between the frame and the target population

Random versus nonrandom sampling

Random sampling: every unit of the population has a known probability of being selected in the sample.
This is sometimes called probability-based sampling.
- can be analysed using probability theory and statistical theory.
- every unit of the population has an equal probability of being selected in the sample
- Random sampling implies that chance governs the process of selection.

Nonrandom sampling: not every unit of the population has a known probability of being selected in the
sample. sometimes nonrandom sampling is called nonprobability sampling
- Members of nonrandom samples are not selected by chance. For example, they might be selected
because they are at the right place at the right time or because they know the people conducting
the research.
- Because units of the population have an unknown probability of being selected, it is impossible to
assign a probability of occurrence in nonrandom sampling.
- Nonrandom sampling methods are not appropriate techniques for gathering data to be analysed
by most of the statistical methods

Random sampling techniques

Simple random sampling

simple random sampling: every (possible) unit of the population has the same probability of being
sampled and so every unit has an equal probability of being selected in the sample.
- To conduct a simple random sample, each unit of the frame is numbered from 1 to N (where N is
the size of the population).
- A random number generator is used to select n items in the sample.
Stratified random sampling
stratified random sampling: the population is divided into non-overlapping subpopulations called strata.
The researcher then extracts a simple random sample from each of the subpopulations.
- Potential to reduce sampling error
- Potential to match the sample closely to the population is greater
- Portions of the total sample are taken from different population subgroups
- More costly than simple random sampling
- Each unit of the population must be assigned to a stratum before the random selection
process begins.
- Based on available information
- strata are internally homogeneous

Stratified random sampling can be either proportionate or disproportionate.

Proportionate: the proportion of the sample taken from each stratum reflects the proportion of each
stratum within the whole population
- For example, suppose voters are being surveyed in Perth and the sample is being stratified by
religion: Buddhist, Christian, Jewish, Muslim and others. If Perth’s population is 49% Christian and if a
sample of 1000 voters is taken, the sample would require inclusion of 490 Christians to achieve
proportionate stratification

Disproportionate: whenever the proportions of the strata in a sample are different from the proportions of
the strata in the population.

Systematic sampling
Systematic sampling: every kth item is
selected to produce a sample of size n from a
population of size N.
- k is sometimes called the sampling
- If k is not an integer value, its whole
number value should be used.
- Systematic sampling methodology is based on the assumption that the source of population
elements is random

Cluster (or area) sampling

Cluster/area sampling: dividing the population into non-overlapping areas or clusters.
- identifies clusters that tend to be internally heterogeneous, unlike stratified random sampling
- each cluster contains a wide variety of elements and is a miniature, or microcosm, of the
- E.g towns, companies, homes, uni- versities, areas of a city and geographical regions
- After choosing the clusters, the researcher randomly selects individual elements in the sample
from the clusters.
- If the elements of a cluster are similar, cluster sampling may be statistically less efficient than
simple random sampling
- In an extreme case — when the elements of a cluster are the same — sampling from the
cluster may be no better than sampling a single unit from the cluster.
- The costs and problems of statistical analysis are greater with cluster or area sampling than
with simple random sampling.

Two stage sampling: clusters are too large and a second set of clusters is taken from each original cluster.

Nonrandom sampling
Nonrandom sampling techniques: techniques used to select elements from a population by any
mechanism that does not involve a random selection process.
- Convenience sampling: elements for the sample are selected for the convenience of the
researcher. The researcher typically chooses elements that are readily available, nearby or willing
to participate
- Judgement sampling: occurs when elements selected for the sample are chosen by the
judgement of the researcher.
- Quota sampling: certain population subclasses, such as age group, gender and geographical
region, are used as strata; the researcher uses a nonrandom sampling method to gather data from
one stratum until the desired quota of samples is filled. Quotas are described by quota controls,
which set the sizes of the samples to be obtained from the subgroups.
- Snowball sampling: survey subjects are selected based on referral from other survey respondents.

Types of errors from collecting sample data

Sampling error
Sampling error: difference between the value computed from a sample (a statistic) and the corresponding
value for the population (a parameter).
- Occurs because the sampling process involves selecting a subset of the population and not the
entire population.
- We can minimise sampling error by taking a larger sample or using stratified random sampling.
- this has to be weighed against the extra cost involved in doing so.
- sampling error formulas can be derived for each of the random sampling designs, which can then
be incorporated into the expression for the sampling distribution.

Nonsampling errors
- Missing data
- Recording errors
- Input processing errors and analysis errors.
- Errors of unclear definitions
- Defective questionnaires
- Poorly conceived concepts.

Sampling distribution of the sample mean

To analyse the sample statistic,it is essential to know the distribution of the statistic.

Sampling distribution: The way sample means are spread out when plotted.
- Samples that are randomly selected, even when selected and not replaced into the population,
have a range of sample mean values that are spread out and follow some kind of distribution.

Central limit theorem

If samples of size n are drawn randomly from a population with a mean 𝜇 and standard deviation 𝜎, the
sample means are approximately normally distributed for sufficiently large samples (n ≥ 30) regardless of
the shape of the population variable distribution. If the population variable is normally distributed, the
sample means are normally distributed for any sample size.

Standard error of the mean: Standard deviation of the sample means: population divided by the square
root of the sample size.
SEmean=σ mean=

Central limit theorem → as long as a sample size of 30 or more is selected from any shaped population
distribution, the sampling distribution of the sample mean will be approximately normally distributed.
- Creates potential to apply our knowledge of the normal distribution to many problems when the
sample size is sufficiently large.
- Used when the distribution of a variable in a population is unknown.
- To find 𝜇x without using a mathematical derivation, all possible samples of the same size would
have to be selected randomly from a population. Then each sample average would have to be
calculated. Finally, the average of all the sample averages would be calculated to find 𝜇x.

Using the central limit theorem with 𝜇x = 𝜇 and 𝜎x = SEx = and then making these substitutions into the
z formula for the sampling distribution of the sample means gives the z formula for sample means.
- When a population variable is normally distributed and the sample size is one (n = 1), the z formula
for the sampling distribution of the sample means becomes exactly the same as the z formula for
individual values in the population.
- If n = 1, the sample mean of a single value is the same as that value, and the value of SEx =

Sampling from a finite population

In cases of finite populations, a statistical adjustment can be made to the z formula for
sample means, known as the finite population correction (fpc) factor:

Whenever researchers are working with a finite population they can use the fpc factor.
- Many researchers check to see if the sample size is less than 5% of the finite population size
(meaning n N < 0.05).
- If this is the case, then the fpc factor will have little effect on the standard error of the mean
and so can be disregarded in any calculations.

Sampling distribution of the sample proportion, p

If a sample involving a categorical variable is analysed, the frequency of occurrence of a particular cat-
egory can be found. If a researcher uses a sample to analyse a categorical variable, the statistic called the
sample proportion, denoted ̂p can be used.

Sample proportion: ratio

that is calculated by
dividing the frequency at
which a given
characteristic occurs in a
sample by the total
number of items in the

- The central limit theorem applies to sample proportions in that the normal distribution
approximates the shape of the distribution of sample proportions.
- The approximation is true if both np > 5 and nq > 5 (where p is the population proportion and q = 1 − p).
- The mean of all possible sample proportions of the same size n randomly drawn from a population
is p (the population proportion).

Standard error of the proportion: standard

deviation of this distribution of sample
proportions: √ ❑

Key Equations

