The document discusses different sampling methods like simple random sampling, stratified random sampling, and cluster sampling that can produce representative samples, as well as bad methods like convenience sampling. It also covers how to identify study design elements like populations, parameters, samples, and statistics. Common sources of bias in survey design like leading questions or vague concepts are also examined.
The document discusses different sampling methods like simple random sampling, stratified random sampling, and cluster sampling that can produce representative samples, as well as bad methods like convenience sampling. It also covers how to identify study design elements like populations, parameters, samples, and statistics. Common sources of bias in survey design like leading questions or vague concepts are also examined.
The document discusses different sampling methods like simple random sampling, stratified random sampling, and cluster sampling that can produce representative samples, as well as bad methods like convenience sampling. It also covers how to identify study design elements like populations, parameters, samples, and statistics. Common sources of bias in survey design like leading questions or vague concepts are also examined.
The document discusses different sampling methods like simple random sampling, stratified random sampling, and cluster sampling that can produce representative samples, as well as bad methods like convenience sampling. It also covers how to identify study design elements like populations, parameters, samples, and statistics. Common sources of bias in survey design like leading questions or vague concepts are also examined.
Populations, Parameters, Samples, and Statistics
Ideas Behind Sampling Good Sampling Designs Bad Sampling Designs Surveys Lecture 2 Sections 8.1 – 8.5 Populations and Samples • Population: any complete • Sample: set of units selected collection of people or objects from a population that a that a statistician is interested statistician analyzes to better in studying understand the population • Parameter: value that • Statistic: value calculated from describes a characteristic of a a sample that serves as an population estimate of a parameter Problem with Populations • Problem: Parameters are generally unknown • Population is often too large to get response from every member • Impractical to examine every subject Take sample from population Population Proportion: Males Unknown
Females Sample Proportion:
Population: College Students Sample
• Solution: Take a _________ from the population • Sample _______________ population and (usually) _____________________________ • Calculate ______________ and use it to _______________________________ Example: Identifying Parts of a Study • Scenario: Want to learn about how frequently college students engage in binge drinking. Survey of 796 college students found 288 (or 36.2%) reported binge drinking within the last month. • Task: Identify the population, parameter, sample, and statistic. • Answer: • Population: College students • Parameter: Overall proportion of ____________________________________________ • Exact value is _____________ so the best we can do is _________________________________ • Sample: ___________________________________ • Statistic: _________ (or ______) Example: Identifying Parts of a Study • Scenario: Want to learn about how frequently college students engage in binge drinking. Survey of 796 college students found 288 (or 36.2%) reported binge drinking within the last month. • Question: What does the statistic tell us about the parameter? • Answer: Overall proportion of U.S. college students who binge drink is _______________________ • Question: Why would it be difficult to obtain the exact value of the parameter? • Answer: • Population _________________________________________ • Population is _______________________ _________________________________________ Ideas Behind Sampling • Sampling Frame: list of cases from which a random sample is drawn; usually a large subset of the population
• Samples must be:
• Representative: characteristics of the sample closely resemble the characteristics of the population • If not, the statistic could be biased (deviates significantly from parameter) • Selected Randomly: observations are chosen by chance rather than by deliberately and intentionally selecting specific cases • Large Enough: need enough information to understand the population Good Sampling Methods • Simple Random Sample: method of sampling in which every member of the sampling frame has the same chance of being chosen • Systematic Sample: method of sampling where members are sampled according to some predetermined rule by skipping a certain number of people and then sampling the nth person Good Sampling Methods • Stratified Random Sample: method of sampling in which the population is first divided into groups (called strata) according to some characteristic and then a random sample is taken from within each stratum • Cluster Sample: sampling method where the population is divided into similar groups (called clusters), a simple random sample of the clusters is taken, and then every member in each selected cluster becomes part of the sample Example: Generating Samples • Scenario: The United States has four different census regions: West, Midwest, Northeast, and South. Suppose we have a sampling frame of all 120 million households in the country. We want to learn what the average annual income is for United States families. • Question: How can we generate each of the following random samples? • Simple random sample • Stratified random sample • Cluster sample • Systematic sample Example: Simple Random Sample • Question: How can we generate a simple random sample to collect data to estimate the average annual income of United States families? • Answer: • Use a random number generator to select 2000 rows from the 120 million households
• Question: How well will this sampling method work?
• Answer: Very well • Guaranteed to get good mix of households from all census areas • Main risk is under representation one census area: unlucky sample may yield only 100 cases from the West Example: Stratified Random Sample • Question: How can we generate a stratified random sample to collect data to estimate the average annual income of United States families? • Answer: • Divide sampling frame into the four census areas • Take random sample of 500 households from each census area
• Question: How well will this sampling method work?
• Answer: Extremely well • Obtain random samples from each census area • Guarantees each census area is well represented Example: Cluster Sample • Question: How can we generate a cluster sample to collect data to estimate the average annual income of United States families? • Answer: • Choose a random sample of census areas (e.g. two of the four regions) • Sample every household in these two regions
• Question: How well will this sampling method work?
• Answer: Terrible • Would require us to sample millions of households • Not feasible in this situation Example: Systematic Sample • Question: How can we generate a systematic sample to collect data to estimate the average annual income of United States families? • Answer: • Choose every 60,000th observation from the sampling frame to obtain our 2000 cases
• Question: How well will this sampling method work?
• Answer: Very well • Should be quite similar to a simple random sample Bad Sampling Methods • Convenience Sample: a sample obtained from the people who were easiest to access • Voluntary Sample: a sample obtained when a large group of individuals is invited to participate and all recorded responses are recorded; statistician does little to no work other than offer the opportunity to participate
• Both of these methods can result in undercoverage, where some
portion of the population is underrepresented or not represented at all in the sample. • Can be a large source of bias Example: Convenience Sample • Scenario: The Washington Capitals played the Vegas Golden Knights in the Stanley Cup finals last year. You ask everyone in your group of friends who they were rooting for. • Question: Why is this a convenience sample? • Answer: Obtaining responses from the people who are easiest to access but sample does not necessarily reflect the opinion of the entire population • Question: Why is this sample likely to be biased? • Answer: You and your friends likely have the same opinion in sports teams that may not mimic those of the entire hockey fanbase Example: Voluntary Sample • Scenario: After a political debate, CNN wants to gauge if its viewers believed the Democratic candidate or the Republican candidate won the debate. They put a link to a poll on their website across the scroll at the bottom of the screen and invite the viewing audience to respond. • Question: Why is this a voluntary sample? • Answer: Respondents self-select into the study and no sampling is done by the network • Question: Why is this likely to be biased? • Answer: Two reasons… • CNN has a more liberal audience so sample doesn’t represent population • Only viewers who feel passionately will take time to vote Example: Study Design • Scenario: Three students want to estimate proportion of college students who use a Mac: • A: Asks 10 friends and finds that 8 (80%) use a Mac • B: Observes a class of 300 students and finds that of the 200 who brought computers, 130 (65%) use a Mac • C: Randomly samples 300 students from the university, sends out a survey to each, and finds 120 of 200 who respond (60%) use a Mac • Question: Which student’s sampling method is the best? • Answer: Student C • Used simple random sample so sample is likely representative • A: Used convenience sample, not random, and used small sample size • B: Should have used cluster sampling by sampling more classes Surveys • Survey: method of data collection where respondents are asked questions and self-report responses on various topics
• Sources of Bias in Designing Survey Questions
• Complicated questions • Vague concepts • Leading questions • Central tendency bias • Error prone response options • Voluntary response bias • Covered with problem with voluntary sample Example: Complicated Question • Scenario: An online survey asked respondents: • “How likely are you to go out for dinner and a movie this weekend?” • Choose: Very likely/somewhat likely/not likely • Question: Why is this a complicated question? • Answer: Asking about two events in the same question • Respondents planning only one activity may answer “not likely” • Question: How can this question be improved? • Answer #1: Change wording and responses • Ask: “Which of the following do you plan on doing this weekend?” • Choose: Dinner and movie/ movie/ dinner • Answer #2: Ask two different questions Example: Vague Concepts • Scenario: In the 1960s, a market research firm wanted to learn about stay-at-home mom’s favorite dish soap and asked the question, “What is your favorite soap?” • Question: What do you think happened when the firm got its results? • Answer: #1 response was “Days of our lives” • The term “soap” is ambiguous • Question: What should have been done instead? • Answer: Clearly define “soap” • Ask: ”What is your favorite dish soap?” Example: Leading Questions • Scenario: Online survey question asked to gauge support of the eligibility of athletes who used steroids for the Hall of Fame: • “Should players who gained an unfair advantage in professional sports by using steroids be eligible to be enshrined into the Hall of Fame?” • Question: What is wrong with this question? • Answer: Two biased phrases • “Unfair advantage”: Implies cheating; “Enshrined”: Implies glorification • Writer clearly believes players on steroids receive unfair advantage • Question: How can the question be reworded to eliminate bias? • Answer: “Should players who use steroids be allowed in Hall of Fame?” Example: Central Tendency Bias • Scenario: Huffington Post asked respondents: • “On a scale from 1 to 5 with 1 being strongly oppose and 5 being strongly support, what is your opinion on using the death penalty for people convicted of first degree murder?” • Question: What response are people most likely to choose? • Answer: 3 • People tend not to take sides on controversial issues • Question: How can the bias be eliminated? • Answer: Use scale from 1 to 4 or 1 to 6 • Forces respondents to chose a side Example: Error Prone Response Options • Scenario: One question in an exit poll after an election asked for the respondent’s age with the following answer choices: • Choose: 18-30/30-40/40-50/50-60/60+ • Question: What is the problem with these answer choices? • Answer: Categories overlap • Two choices for 30, 40, 50, 60 • Question: How can this problem be fixed? • Answer: Change categories • Choose: 18-29, 30-39, etc Example: Identify Issue in Survey • Survey Question: Are you in favor of repealing Obamacare and replacing it with a new health care bill? • Response Options: Yes/No • Question: What type of issue exists with this survey question? • Answer: Complicated question • Could favor repealing Obamacare, but not expanding to universal health care • Could support Obamacare, but want additional options in a new bill