Lecture 2

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Sampling Methods

 Populations, Parameters, Samples, and Statistics

 Ideas Behind Sampling
 Good Sampling Designs
 Bad Sampling Designs
 Surveys
Lecture 2
Sections 8.1 – 8.5
Populations and Samples
• Population: any complete • Sample: set of units selected
collection of people or objects from a population that a
that a statistician is interested statistician analyzes to better
in studying understand the population
• Parameter: value that • Statistic: value calculated from
describes a characteristic of a a sample that serves as an
population estimate of a parameter
Problem with Populations
• Problem: Parameters are generally unknown
• Population is often too large to get response from every member
• Impractical to examine every subject
Take sample
from population
Population Proportion:
Males Unknown

Females Sample Proportion:

Population: College Students Sample

• Solution: Take a _________ from the population
• Sample _______________ population and (usually) _____________________________
• Calculate ______________ and use it to _______________________________
Example: Identifying Parts of a Study
• Scenario: Want to learn about how frequently college students
engage in binge drinking. Survey of 796 college students found
288 (or 36.2%) reported binge drinking within the last month.
• Task: Identify the population, parameter, sample, and statistic.
• Answer:
• Population: College students
• Parameter: Overall proportion of ____________________________________________
• Exact value is _____________ so the best we can do is _________________________________
• Sample: ___________________________________
• Statistic: _________ (or ______)
Example: Identifying Parts of a Study
• Scenario: Want to learn about how frequently college students
engage in binge drinking. Survey of 796 college students found
288 (or 36.2%) reported binge drinking within the last month.
• Question: What does the statistic tell us about the parameter?
• Answer: Overall proportion of U.S. college students who binge
drink is _______________________
• Question: Why would it be difficult to obtain the exact value of
the parameter?
• Answer:
• Population _________________________________________
• Population is _______________________  _________________________________________
Ideas Behind Sampling
• Sampling Frame: list of cases from which a random sample is
drawn; usually a large subset of the population

• Samples must be:

• Representative: characteristics of the sample closely resemble the
characteristics of the population
• If not, the statistic could be biased (deviates significantly from parameter)
• Selected Randomly: observations are chosen by chance rather than by
deliberately and intentionally selecting specific cases
• Large Enough: need enough information to understand the population
Good Sampling Methods
• Simple Random Sample: method of sampling in which every
member of the sampling frame has the same chance of being
• Systematic Sample: method of sampling where members are
sampled according to some predetermined rule by skipping a
certain number of people and then sampling the nth person
Good Sampling Methods
• Stratified Random Sample: method of sampling in which the
population is first divided into groups (called strata) according to
some characteristic and then a random sample is taken from
within each stratum
• Cluster Sample: sampling method where the population is
divided into similar groups (called clusters), a simple random
sample of the clusters is taken, and then every member in each
selected cluster becomes part of the sample
Example: Generating Samples
• Scenario: The United States has four different census regions:
West, Midwest, Northeast, and South. Suppose we have a
sampling frame of all 120 million households in the country. We
want to learn what the average annual income is for United States
• Question: How can we generate each of the
following random samples?
• Simple random sample
• Stratified random sample
• Cluster sample
• Systematic sample
Example: Simple Random Sample
• Question: How can we generate a simple random sample to
collect data to estimate the average annual income of United
States families?
• Answer:
• Use a random number generator to select 2000 rows from the 120
million households

• Question: How well will this sampling method work?

• Answer: Very well
• Guaranteed to get good mix of households from all census areas
• Main risk is under representation one census area: unlucky sample may
yield only 100 cases from the West
Example: Stratified Random Sample
• Question: How can we generate a stratified random sample to
collect data to estimate the average annual income of United
States families?
• Answer:
• Divide sampling frame into the four census areas
• Take random sample of 500 households from each census area

• Question: How well will this sampling method work?

• Answer: Extremely well
• Obtain random samples from each census area
• Guarantees each census area is well represented
Example: Cluster Sample
• Question: How can we generate a cluster sample to collect data to
estimate the average annual income of United States families?
• Answer:
• Choose a random sample of census areas (e.g. two of the four regions)
• Sample every household in these two regions

• Question: How well will this sampling method work?

• Answer: Terrible
• Would require us to sample millions of households
• Not feasible in this situation
Example: Systematic Sample
• Question: How can we generate a systematic sample to collect
data to estimate the average annual income of United States
• Answer:
• Choose every 60,000th observation from the sampling frame to obtain
our 2000 cases

• Question: How well will this sampling method work?

• Answer: Very well
• Should be quite similar to a simple random sample
Bad Sampling Methods
• Convenience Sample: a sample obtained from the people who
were easiest to access
• Voluntary Sample: a sample obtained when a large group of
individuals is invited to participate and all recorded responses are
recorded; statistician does little to no work other than offer the
opportunity to participate

• Both of these methods can result in undercoverage, where some

portion of the population is underrepresented or not represented
at all in the sample.
• Can be a large source of bias
Example: Convenience Sample
• Scenario: The Washington Capitals played the Vegas Golden
Knights in the Stanley Cup finals last year. You ask everyone in
your group of friends who they were rooting for.
• Question: Why is this a convenience sample?
• Answer: Obtaining responses from the people who are easiest to
access but sample does not necessarily reflect the opinion of the
entire population
• Question: Why is this sample likely to be biased?
• Answer: You and your friends likely have the same opinion in
sports teams that may not mimic those of the entire hockey
Example: Voluntary Sample
• Scenario: After a political debate, CNN wants to gauge if its
viewers believed the Democratic candidate or the Republican
candidate won the debate. They put a link to a poll on their
website across the scroll at the bottom of the screen and invite the
viewing audience to respond.
• Question: Why is this a voluntary sample?
• Answer: Respondents self-select into the study and no sampling
is done by the network
• Question: Why is this likely to be biased?
• Answer: Two reasons…
• CNN has a more liberal audience so sample doesn’t represent population
• Only viewers who feel passionately will take time to vote
Example: Study Design
• Scenario: Three students want to estimate proportion of college
students who use a Mac:
• A: Asks 10 friends and finds that 8 (80%) use a Mac
• B: Observes a class of 300 students and finds that of the 200 who
brought computers, 130 (65%) use a Mac
• C: Randomly samples 300 students from the university, sends out a
survey to each, and finds 120 of 200 who respond (60%) use a Mac
• Question: Which student’s sampling method is the best?
• Answer: Student C
• Used simple random sample so sample is likely representative
• A: Used convenience sample, not random, and used small sample size
• B: Should have used cluster sampling by sampling more classes
• Survey: method of data collection where respondents are asked
questions and self-report responses on various topics

• Sources of Bias in Designing Survey Questions

• Complicated questions
• Vague concepts
• Leading questions
• Central tendency bias
• Error prone response options
• Voluntary response bias
• Covered with problem with voluntary sample
Example: Complicated Question
• Scenario: An online survey asked respondents:
• “How likely are you to go out for dinner and a movie this weekend?”
• Choose: Very likely/somewhat likely/not likely
• Question: Why is this a complicated question?
• Answer: Asking about two events in the same question
• Respondents planning only one activity may answer “not likely”
• Question: How can this question be improved?
• Answer #1: Change wording and responses
• Ask: “Which of the following do you plan on doing this weekend?”
• Choose: Dinner and movie/ movie/ dinner
• Answer #2: Ask two different questions
Example: Vague Concepts
• Scenario: In the 1960s, a market research firm wanted to learn
about stay-at-home mom’s favorite dish soap and asked the
question, “What is your favorite soap?”
• Question: What do you think happened when the firm got its
• Answer: #1 response was “Days of our lives”
• The term “soap” is ambiguous
• Question: What should have been done instead?
• Answer: Clearly define “soap”
• Ask: ”What is your favorite dish soap?”
Example: Leading Questions
• Scenario: Online survey question asked to gauge support of the
eligibility of athletes who used steroids for the Hall of Fame:
• “Should players who gained an unfair advantage in professional sports
by using steroids be eligible to be enshrined into the Hall of Fame?”
• Question: What is wrong with this question?
• Answer: Two biased phrases
• “Unfair advantage”: Implies cheating; “Enshrined”: Implies glorification
• Writer clearly believes players on steroids receive unfair advantage
• Question: How can the question be reworded to eliminate bias?
• Answer: “Should players who use steroids be allowed in Hall of
Example: Central Tendency Bias
• Scenario: Huffington Post asked respondents:
• “On a scale from 1 to 5 with 1 being strongly oppose and 5 being
strongly support, what is your opinion on using the death penalty for
people convicted of first degree murder?”
• Question: What response are people most likely to choose?
• Answer: 3
• People tend not to take sides on controversial issues
• Question: How can the bias be eliminated?
• Answer: Use scale from 1 to 4 or 1 to 6
• Forces respondents to chose a side
Example: Error Prone Response Options
• Scenario: One question in an exit poll after an election asked for
the respondent’s age with the following answer choices:
• Choose: 18-30/30-40/40-50/50-60/60+
• Question: What is the problem with these answer choices?
• Answer: Categories overlap
• Two choices for 30, 40, 50, 60
• Question: How can this problem be fixed?
• Answer: Change categories
• Choose: 18-29, 30-39, etc
Example: Identify Issue in Survey
• Survey Question: Are you in favor of repealing Obamacare and
replacing it with a new health care bill?
• Response Options: Yes/No
• Question: What type of issue exists with this survey question?
• Answer: Complicated question
• Could favor repealing Obamacare, but not expanding to universal health
• Could support Obamacare, but want additional options in a new bill

You might also like