Download as pdf or txt
Download as pdf or txt
You are on page 1of 16


Sampling, Classifying
and Graphing Data to get
Week 2, Part 1: Sampling
Gather information from the whole
population or a part of it
When we gather information from the
entire population, we perform a census.
A census requires that we know how to
access every item in the population.
It requires that we are able to list every
item in the population.
Unfortunately, this is most often very
hard to do.

How do we get data?

We use samples because
dont have access to the whole population because it is
an ongoing, recurring process. New items are being
generated constantly.
the cost of contacting the entire population would be too
great in terms of time and money.
the items being selected will be destroyed in the process
of getting their information. (Think of testing length of
light bulb life.)
a census may require repetitive record keeping tasks
making it prone to errors resulting from boredom or
fatigue. Samples can be more accurate than a census in
such circumstances.

Why do we sample?
Probability Pseudo
Sampling Samples
Non- Convenience,
probability judgment
Such samples are characterized by having
certain population items that have a zero
probability of being included in the sample.
Convenience sample
Judgment sample
Such samples introduce bias into the data
and are not appropriate as a base for
statements about the population at large.
Bias means that as the sample gets larger, it
does not start to resemble the population more
and more.

Non-Probability Samples
These sampling methods are characterized by
all items in a population having a non-zero
chance of being in a sample.

A very special probability sampling method

results in a simple random sample. All items
in the population from which the sample was
drawn have the SAME non-zero chance of
being selected.

Probability Samples
Unbiased as the sample gets larger, the sample
gets to resemble the population more and more
Independent selecting any item from the
population has no effect on the selection of any
other item in the sample.
The statistics gold standard. All inferential
techniques learned depend on the sample being a
simple random sample.

Simple Random Sample

When simple random samples are too
complicated, too costly or a particular
situation calls for a special method.
The systematic random sample
The stratified random sample
The random cluster sample

These methods are not considered as

pure as the simple random sample, but
they have many good qualities.

Pseudo-Random Samples
Pseudo random Stratified
Random Sample
Random Sample

Random Cluster
Systematic random samples:
The first observation is chosen at random;
remaining observations are drawn at intervals
determined by the size of the sample desired.
The process: 1) select a sample size, 2) calculate
N/n to get an interval, 3) randomly select a
starting point and choose the observation at the
starting point, 4) select the next observation that
is N/n observations away, and 5) continue until
you have selected n observations.

Pseudo-Random Samples
I have a list of 150 Location Value
items that are in
numerical order. I 92 28.75517
decided I would like a 107 28.9428
sample of 10. 122 28.96165
N/n = 150/10 = 15 137 28.97136
Randomly selected item 2 24.6281
# 92. 17 27.08843
Go to the item 92+15 32 27.26721
later = 107 & select that
number. 47 27.41082
Here is the complete 62 27.58107
77 28.46

A systematic example
Using Systematic Samples
With data that has an inherent order
items moving on an assembly line
numbered invoices
customers at checkout
If the population is in random order, however, this
method results in a simple random sample.
If the population is ordered by magnitude, this
method results in a sample that is more informative
than a simple random sample.
if the population has cyclical variation this method
results in a sample that is less informative than a
simple random sample. It is not appropriate for
situations where there is a cycle in the population.

Pseudo-Random Samples
Stratified random samples
Use when a population can be divided into mutually
exclusive (non-overlapping) and collectively
exhaustive (including all population items) subgroups,
such as gender or political preference.
Each sub-group is randomly sampled.
Often the goal is to select a sample representative of
all the strata in a population but minimize the size of
the whole sample.
Often the number sampled from each strata represents
the proportion of that strata in the population.

Pseudo-Random Samples
Each item in the list used Oil Condition Obs Value
earlier was a bird found on Not Visibly Oiled 60 27.58037
the shore after the Deep Not Visibly Oiled 49 27.4124
Horizon Oil Spill. Not Visibly Oiled 48 27.4124
The oiled-condition of the Not Visibly Oiled 130 28.96679
bird was listed, visibly Not Visibly Oiled 69 27.59439
oiled (20%), not visibly Not Visibly Oiled 133 28.96775
oiled (67%) and unknown Not Visibly Oiled 66 27.58514
(13%). Unknown 106 28.9428
Sorting the observations Visibly Oiled 71 27.80305
into these strata and Visibly Oiled 72 27.97849
selecting a random sample
from each stratum, The observations selected
generates a stratified represent the proportion out of 10
random sample. of each group out of the whole.

A stratified example
Cluster random samples
Cluster sampling is used when a list of all the
individuals in a population doesnt exist, but a list of
the location of groups of individuals does exist.
The population is divided into units, often
A group of units is selected randomly.
All items of interest in those selected units are
measured or surveyed.
Multi-stage cluster sampling involves several
iterations of this process

Pseudo-Random Samples
Clusters can be defined by Week from spill Value
location in time. For example, 20 28.96411
the number of weeks after the 20 27.78272
oil spill a bird was found is
22 28.42994
22 28.96064
The items were sorted by week 22 28.55
found, then a random sample
27 28.97873
of 4 of the weeks was selected.
27 28.593
In an ideal world, the number
27 28.59478
of items in the weeks selected
33 28.95648
would give the sample size we
wanted, however, it often 33 28.96634
doesnt. A second sampling
method might be used to
reduce the sample size after
using the cluster method.

A cluster example

You might also like