Professional Documents
Culture Documents
Preventing Bad - Biased Samples - Coursera Reading
Preventing Bad - Biased Samples - Coursera Reading
Consider, for example, a national sample of 1,000 cell phone numbers selected using SRS. While in expectation any
one given sample will include a representative random sampling of numbers from area codes across the nation, all
possible random samples using SRS are equally likely. What this means is that a simple random sample of cell
phone numbers that only includes area codes from Florida is just as likely as a simple random sample of numbers that
includes a representative selection across the states. Ideally, we would like to use design strategies to reduce the
chances of such a “bad sample” occurring, especially if our variable of interest tends to take on very different values in
the state of Florida! The major statistical problem with the simple random “Florida” sample is that any estimate that
we compute after collecting data from the sample will likely be very different from the true population parameter that
we are trying to estimate (especially if the variable of interest tends to take on very different values in Florida relative
to the rest of the nation). Because the probability of selecting these extreme samples is equal to the probability of
selecting more representative samples, the sampling distribution for simple random samples can tend to be quite
variable.
A very common sampling technique used to minimize the sampling variance that can arise from these so-called “bad
samples” in SRS is stratification. You’ve already been introduced to stratification in an earlier lecture. When we
conduct stratified sampling, we first allocate portions of our sample to all possible divisions (or “strata”) of the
population of interest (e.g., states). This ensures that some sample will be selected from all of these possible divisions,
and that the overall sample will therefore be representative of the target population. For example, using a technique
known as proportionate allocation, suppose that we knew that 55% of students enrolled in a particular college were
females, and 45% were males. If we wanted to draw a sample of 1,000 students from this college, we would randomly
selected 550 females from a list of all females enrolled, and 450 males from a list of all males enrolled. This ensures
that our entire sample of size 1,000 won’t include only females!