A Brief Introduction To Sampling

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

A Brief Introduction

to Sampling

Researchers usually cannot make direct observations of every


individual in the population they are studying. Instead, they
collect data from a subset of individuals – a sample – and use
those observations to make inferences about the entire
population.

Ideally, the sample corresponds to the larger population on the


characteristic(s) of interest. In that case, the researcher's
conclusions from the sample are probably applicable to the
entire population.

This type of correspondence between the sample and the larger


population is most important when a researcher wants to know
what proportion of the population has a certain characteristic –
like a particular opinion or a demographic feature. Public
opinion polls that try to describe the percentage of the
population that plans to vote for a particular candidate, for
example, require a sample that is highly representative of the
population.

Probability samples Two general approaches to sampling are used in social science
and convenience research. With probability sampling, all elements (e.g., persons,
samples
households) in the population have some opportunity of being
included in the sample, and the mathematical probability that
any one of them will be selected can be calculated.

With nonprobability sampling, in contrast, population elements


are selected on the basis of their availability (e.g., because they
volunteered) or because of the researcher's personal judgment
that they are representative. The consequence is that an
unknown portion of the population is excluded (e.g., those who
did not volunteer). One of the most common types of
nonprobability sample is called a convenience sample – not
because such samples are necessarily easy to recruit, but
because the researcher uses whatever individuals are available
rather than selecting from the entire population.

Because some members of the population have no chance of


being sampled, the extent to which a convenience sample –
regardless of its size – actually represents the entire population
cannot be known.

Recruiting a probability sample is not always a priority for


researchers. A scientist can demonstrate that a particular trait
occurs in a population by documenting a single instance. For
example, the assertion that all lesbians are mentally ill can be
refuted by documenting the existence of even one lesbian who
is free from psychopathology.

Another situation in which a probability sample is not


necessary is when a researcher wishes to describe a particular
group in an exploratory way. For example, interviewing 25
people with AIDS (PWAs) about their experiences with HIV
could provide valuable insights about stress and coping, even
though it would not yield data about the proportion of PWAs in
the general population who share those experiences.

Types of probability Many strategies can be used to create a probability sample.


samples Each starts with a sampling frame, which can be thought of as a
list of all elements in the population of interest (e.g., names of
individuals, telephone numbers, house addresses, census tracts).
The sampling frame operationally defines the target population
from which the sample is drawn and to which the sample data
will be generalized.

Probably the most familiar type of probability sample is


thesimple random sample, for which all elements in the
sampling frame have an equal chance of selection, and
sampling is done in a single stage with each element selected
independently (rather than, for example, in clusters).

Somewhat more common than simple random samples


aresystematic samples, which are drawn by starting at a
randomly selected element in the sampling frame and then
taking everynth element (e.g., starting at a random location in a
telephone book and then taking every 100th name).

In yet another approach, cluster sampling, a researcher selects


the sample in stages, first selecting groups of elements, or
clusters (e.g., city blocks, census tracts, schools), and then
selecting individual elements from each cluster (e.g., randomly
or by systematic sampling).
An example Suppose some researchers want to find out which of two
mayoral candidates is favored by voters. Obtaining a
probability sample would involve defining the target population
(in this case, all eligible voters in the city) and using one of
many available procedures for selecting a relatively small
number (probably fewer than 1,000) of those people for
interviewing. For example, the researchers might create a
systematic sample by obtaining the voter registration roster,
starting at a randomly selected name, and contacting every
500th person thereafter. Or, in a more sophisticated procedure,
the researchers might use a computer to randomly select
telephone numbers from all of those in use in the city, and then
interview a registered voter at each telephone number. (This
procedure would yield a sample that represents only those
people who have a telephone.)

Several procedures would also be available for recruiting a


convenience sample, but none of them would include the entire
population as potential respondents. For example, the
researchers might ascertain the voting preferences of their own
friends and acquaintances. Or they might interview shoppers at
a local mall. Or they might publish two telephone numbers in
the local newspaper and ask readers to call either number in
order to "vote" for one of the candidates. The important feature
of these methods is that they would systematically exclude
some members of the population (respectively, eligible voters
who do not know the researchers, do not go to the shopping
mall, and do not read the newspaper). Consequently, their
findings could not be generalized to the population of city
voters.

Evaluating samples Samples are evaluated primarily according to the procedures by


which they were selected rather than by their final composition
or size. In the example above, it would be impossible to know
if the convenience sample consisting of the researchers' friends
or mall shoppers is representative, even if its demographic
characteristics closely resembled those of the city electorate
(e.g., the same ratios of women to men and Blacks to Whites).
And even if several thousand people called the published
telephone numbers, the sample would be seriously biased.

Of course, results from a probability sample might not be


accurate for many reasons. Using probability sampling
procedures is necessary but not sufficient for obtaining results
that can be generalized with confidence to the entire
population. One of the major concerns about a probability
sample is that its response rate is sufficiently high.

Response rates. Once a sample is selected, an attempt is made


to collect data (e.g., through interviews or questionnaires) from
all of its members. In practice, researchers never obtain
responses from 100% of the sample. Some sample members
inevitably are traveling, hospitalized, incarcerated, away at
school, or in the military. Others cannot be contacted because
of their work schedule, community involvement, or social life.
Others simply refuse to participate in the study, even after the
best efforts of the researcher to persuade them otherwise.

Each type of nonparticipation biases the final sample, usually in


unknown ways. In the 1980 General Social Survey (GSS), for
example, those who refused to be interviewed were later found
to be more likely than others to be married, middle-income, and
over 30 years of age, whereas those who were excluded from
the survey because they were never at home were less likely to
be married and more likely to live alone (Smith, 1983). The
importance of intensive efforts at recontacting sample members
who are difficult to reach (e.g., because they are rarely at home)
was apparent in that GSS respondents who required multiple
contact attempts before an interview was completed (the "hard-
to-gets") differed significantly from other respondents in their
labor force participation, socioeconomic status, age, marital
status, number of children, health, and sex (Smith, 1983).

The response rate describes the extent to which the final data
set includes all sample members. It is calculated as the number
of people with whom interviews are completed ("completes")
divided by the total number of people or households in the
entire sample, including those who refused to participate and
those who were not at home.

Whether data are collected through face-to-face interviews,


telephone interviews, or mail-in surveys, a high response rate is
extremely important when results will be generalized to a larger
population. The lower the response rate, the greater the sample
bias. Fowler (1984), for example, warned that data from mail-in
surveys with return rates of "20 or 30 percent, which are not
uncommon for mail surveys that are not followed up
effectively, usually look nothing at all like the sampled
populations" (Fowler, 1984, p. 49). This is because "people
who have a particular interest in the subject matter or the
research itself are more likely to return mail questionnaires than
those who are less interested" (p. 49).

Fowler (1984) warned that: "[O]ne occasionally will see reports


of mail surveys in which 5 to 20 percent of the sample
responded. In such instances, the final sample has little
relationship to the original sampling process. Those responding
are essentially self-selected. It is very unlikely that such
procedures will provide any credible statistics about the
characteristics of the population as a whole" (p. 48).

Sample size and sampling error. The use of appropriate


sampling methods and an adequate response rate are necessary
for a representative sample, but not sufficient. In addition, the
sample size must be evaluated.

All other things being equal, smaller samples (e.g., those with
fewer than 1,000 respondents) have greater sampling error than
larger samples. To better understand the notion of sampling
error, it is helpful to recall that data from a sample provide
merely an estimate of the true proportion of the population that
has a particular characteristic. If 100 different samples are
drawn from the same sampling frame, they could potentially
result in 100 different patterns of responses to the same
question. These patterns, however, would converge around the
true pattern in the population.

The sampling error is a number that describes the precision of


an estimate from any one of those samples. It is usually
expressed as a margin of error associated with a statistical level
of confidence. For example, a presidential preference poll may
report that the incumbent is favored by 51% of the voters, with
a margin of error of plus-or-minus 3 points at a confidence
level of 95%. This means that if the same survey were
conducted with 100 different samples of voters, 95 of them
would be expected to show the incumbent favored by between
48% and 54% of the voters (51% ± 3%).

The margin of error due to sampling decreases as sample size


increases, to a point. For most purposes, samples of between
1,000 and 2,000 respondents have a sufficiently small margin
of error that larger samples are not cost-effective. However, if
subgroups are to be examined, a larger sample may be
necessary because the margin of error for each subgroup is
determined by the number of people in it. For example,
although a national survey with a probability sample of 1000
adults has a margin of error of roughly 1-3 percentage points
(using a 95% confidence interval), analyses of responses from
the African Americans in that sample (who would probably
number about 100) would have a margin of error of roughly 4-
10 points.

Other considerations This brief discussion has focused on sampling procedures.


However, many other factors also affect the quality of data
from a research study. For example, it is always important to
critically evaluate the specific procedures used for obtaining
responses, including the questions that were asked.

In order for research data to be meaningful, the questionnaire


and the procedures used to collect the data must be valid.
Thevalidity of a method (e.g., a survey questionnaire) refers to
how accurately it measures what it is supposed to measure. If
survey items are so complex or ambiguous that different
respondents interpret them differently, for example, their
validity is compromised. Validity is also threatened if
respondents do not provide accurate or honest answers, either
because of their inability to do so (e.g., due to memory
problems) or their unwillingness to answer truthfully (e.g.,
because the researchers communicated their biases or
expectations to the respondents).

Samples in social and Most behavioral and social science studies use convenience
behavioral research samples consisting of students, paid volunteers, patients,
prisoners, or members of friendship networks or organizations.
Studies with such samples are useful primarily for documenting
that a particular characteristic or phenomenon occurs within a
given group or, alternatively, demonstrating that not all
members of that group manifest a particular trait. Such studies
are also very useful for detecting relationships among different
phenomena.

Sometimes matched convenience samples are used to compare


two groups (e.g., psychological test scores of gay people and
heterosexuals). With this procedure, each individual in the first
sample has a counterpart in the second sample who is of the
same gender, race, educational background, age, or whatever
other characteristics are judged to be relevant. The purpose of
matching is to eliminate known sources of bias; however, the
problem of potential bias from hidden sources still remains.
With a hard-to-reach population (e.g., gay people or persons
who engage in homosexual behavior), a series of studies with
nonprobability samples can suggest rough estimates of the
proportion of the population manifesting various
characteristics. When similar results are obtained repeatedly
with many different nonprobability samples, the likelihood that
those results apply to the population is greater than when only a
single nonprobability sample is used. Nevertheless, inferences
based on such data must be cautious because of the possibility
of hidden systematic bias.

Strictly speaking, inferences cannot be drawn from a


nonprobability sample about the proportion of the population
manifesting (or not manifesting) a particular characteristic.
Realistically, however, funding limitations and the
methodological difficulties of sampling a relatively small and
partially hidden population have usually prohibited the use of
probability samples in research on sexual orientation.

It is extremely important, therefore, that findings obtained with


convenience samples be critically evaluated. Readers should
always ask the following questions:

 What types of people were systematically excluded


from the sample?
 What types of people were over-represented in the
sample?
 Have the findings been replicated by different
researchers using a variety of data-collection methods
with different samples?

You might also like