Unit 1 Tutorials Key Principles of Statistical Methods

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 85

Unit 1 Tutorials: Key Principles of Statistical Methods

INSIDE UNIT 1

Statistics Fundamentals

Statistics Overview
Data
Qualitative and Quantitative Data
Discrete vs. Continuous Data

Sampling

Sampling
Random & Probability Sampling
Simple Random and Systematic Random Sampling
Stratified Random and Cluster Sampling
Multi-Stage Sampling

Experiments

Observational Studies and Experiments


Prospective and Retrospective Studies
Experimental Design
Randomized Block Design
Completely Randomized Design
Matched-Pair Design
Surveys
Blinding
Placebo

Data

Variables
Question Types
Accuracy and Precision in Measurements
Absolute Change and Relative Change
Using Percentages in Statistics
Index Number and Reference Value

Evaluating Studies

Bias
Nonresponse and Response Bias
Selection and Deliberate Bias
Convenience & Self-Selected Samples
Random and Systematic Errors
Margin of Error

Statistics Overview
by Sophia

 WHAT'S COVERED

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 1
This lesson will provide you with an overview of what statistics really is by exploring:

1. Statistics
2. Types of Statistics

1. Statistics
You might be wondering, what is statistics? Is it some complicated formula? Is it some goofy graph that you really don't know that much about?

When people refer to statistics, they're usually referring to information called data that's been collected and synthesized within a statistical study, and sometimes presented
in a graphical form, like this.

While the image may be small and difficult to read, you get the idea that a LOT of information can be presented in the form of a graph.

It can also be presented numerically such as "The median household income in the United States is $46,326."

Video Transcription
[MUSIC PLAYING] The practice of statistics deals with these four concepts here. Collect, analyze, interpret, and present. You begin by collecting information from a
variety of sources. You then proceed to analyze that information that you've collected. After that, you interpret what that analysis means and then you present it in a
way that anyone can understand. And in this course you're going to learn how to do all those things, and if I may try to be honest-- though as a robot, I can't fully
experience the feeling of honesty-- I do understand statistics quite well.

And I must say it's a really neat way to describe our messy world. It's not pretty all the time, but statistics allow us a way to simplify things.

[MUSIC PLAYING]

 STEP BY STEP

The practice of statistics deals with four main steps:

1. Collect. Collect the information from a variety of sources


2. Analyze. Analyze the information that you've collected
3. Interpret. Interpret what that analysis means
4. Present. Present it in a way that anyone can understand

Statistics is a neat way to describe a messy world. It's not pretty all the time. But statistics allows us a way to simplify things down.

 TERM TO KNOW

Statistical study
A way to collect information from individuals

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 2
2. Types of Statistics
When you use descriptive statistics, you are going to analyze what's going on at a particular point and use statistics to describe the information that you've obtained.

On the other hand, when you use inferential statistics, you are going to use statistics that you've obtained and make a generalization about the population at large.

IN CONTEXT
Let's say that you read the newspaper this morning and discovered that the average household income in the United States was reported to be $46,700.

This information didn't come from sampling every household in the United States. That wouldn't be realistic or feasible to knock on all the doors and speak to all
those people. But someone arrived at this number. So, how did they get it?

Well, a sample was taken, and a generalization was made about the entire United States based on that sample.

This is inferential statistics.

 TERMS TO KNOW

Descriptive statistics
Using only the information at hand to describe the selected group of individuals.

Inferential statistics
Using the information at hand to make a larger, more general statement about the entire population of individuals.

 SUMMARY

Statistics allows us to synthesize the information we get from the world around us. There are two types of statistics. Descriptive statistics describe information
gathered at a particular point. Inferential statistics gather information and then makes a generalization or prediction about the population.

Good luck!

Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS. bar chart, CC,
https://en.wikipedia.org/wiki/Chart#/media/File:Black_cherry_tree_histogram.svg no modifications made

 TERMS TO KNOW

Descriptive statistics
Using only the information at hand to describe the selected group of individuals

Inferential statistics
Using the information at hand to make a larger, more general statement about the entire population of individuals

Statistical analysis
All the ways of collecting, analyzing, and interpreting the data

Statistical study
A way to collect information from individuals

Statistics
The study of collecting, analyzing, interpreting, and presenting information

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 3
Data
by Sophia

 WHAT'S COVERED

This lesson will introduce the collection and evaluation of data including:

1. Defining Data
2. Evaluating Types of Data
3. Gathering Data

1. Defining Data
Data is the pieces of information that we use in order to answer some statistical question. It could be a number or an attribute.

But ultimately, it's the pieces of information that we use to get a more accurate picture of a scenario. Every piece of data helps us to get a more accurate description, which
begs the question, how do you obtain data? Where does it come from? Do you just make it up? Where is data?

 TERM TO KNOW

Data
Information used in a study to answer a statistical question.

2. Evaluating Types of Data


There are two types of data to serve your purposes. It's possible that the easier route is to go with something someone else has already done. Available data is data that
has already been collected by somebody.

Now, who collects data? Well, a lot of places collect data, such as:

Government organizations
Polling organizations
News sources
Government entities
Private entities

The vast majority of sources are trustworthy. However, when using available data, it's important to think critically about what the information is trying to convey. It’s
essential to break apart the information and ask yourself these questions:

Who collected it?


Are they reputable?
Are they trustworthy?
When was it collected?
How was it collected?
Why did they collect it?

So, how do you know when you need to gather the information yourself? Gathering information yourself is called raw data. Obviously, if the population doesn’t match your
topic of interest, then it is of no value to you, so you need to gather it yourself.

But what about less obvious characteristics such as whether or not a source has an agenda? This is a key point here. Having an agenda, whether intentional or not, can
introduce what's called bias.

Often, polling organizations and news organizations and government entities try to do the best job they can to get relevant information. It's not usually intentionally put out
there. But sometimes it is when they're trying to push some kind of agenda.

 TERMS TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 4
Available Data
Data collected by some other entity - a government organization or private company.

Raw Data
Unorganized, unprocessed, and not summarized. Typically, this is data that is not already available.

Bias
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a study.

3. Gathering Data
If you choose to collect your own data, you must think critically and ask yourself these questions:

Who will receive this data?


For whom is the data intended?
How will you and others gain access to it?

Collecting data is important because it's the source of statistics. Think about data as the raw means of creating something useful. If you collect your data well, the statistics
are going to be accurate. If you collect your data poorly, then your data is poor. There's no rescuing that.

 BIG IDEA

You can't make useful statistics out of poor data. Thinking critically will help you determine which type of data should be used for your purposes.

 SUMMARY

This tutorial defined data as “information used in a study to answer a statistical question.” We discussed how to evaluate types of data, available or raw, and
questions focusing on the who, what, why, and how should be posed to help identify bias. When gathering your own data, it’s important to understand your
audience and consider how they will gain access to all your hard work.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Available Data
Data collected by some other entity - a government organization or private company.

Bias
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a study

Data
Information used in a study to answer a statistical question

Raw Data
Unorganized, unprocessed and not summarized.. Typically, this is data that is not already available

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 5
Qualitative and Quantitative Data
by Sophia

 WHAT'S COVERED

In this tutorial, you're going to learn about the difference between qualitative data and quantitative data by examining:

1. Qualitative Data
a. Nominal Measurements
b. Ordinal Measurements
2. Quantitative Data
3. Qualitative and Quantitative Data in Practice

1. Qualitative Data
Qualitative data is also often called “categorical data”. It is not numerical in the sense that we can do numerical operations with it, like adding numbers together or finding
an average, but rather, it fits in the category.

 EXAMPLE Gender: male and female. That's a qualitative variable with two categories.
Letter grades AND zip codes feature numbers, but you wouldn’t necessarily do mathematical equations with them. You wouldn’t find an average zip code, for instance. The
purpose of zip codes is to divide areas into categories. Hair color is another example of qualitative data because you can group those with black hair and put those with
blonde hair in another group.

It's important to know that qualitative data can be divided further into two categories:

Nominal Measurements
Ordinal Measurements

 TERM TO KNOW

Qualitative/Categorical Data
Data whose values are the names of categories. These can be numbers, but not the kinds of numbers with which it makes sense to do any numerical operations.

1a. Nominal Measurements


 EXAMPLE Favorite color. The order of the listed categories makes no difference. It doesn't matter if you put the colors below in the order of the color spectrum
or not.

With nominal data, it only makes sense to reference which category has the largest frequency. In this case, let’s say most people say that green is their favorite color. That
is what you would report and it doesn’t matter that green is the 4th box from the left.

 TERM TO KNOW

Nominal Level of Measurement


Qualitative data where the order in which the categories are presented does not matter.

1b. Ordinal Measurements


 EXAMPLE Rating scale. The order of the listed categories is very important because the order is associated with a type of value. It’s very important that you
don’t mix up the order here because the circle on the furthest left indicates you are feeling no pain.

Pain Scale
❍ ❍ ❍ ❍ ❍ ❍ ❍

No Moderate Worst
Pain Pain Pain

With ordinal data, it’s important to keep the order straight, or rather, in order, to express a spectrum ranging from lowest to highest, or worst to best. Ratings like that.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 6
 TERM TO KNOW

Ordinal Level of Measurement


Qualitative data where the order in which the categories are presented matters.

2. Quantitative Data
On the other hand, you have quantitative data. Quantitative data are expressed numerically. It makes sense to do numerical operations with it, like finding averages or
adding them together.

Examples of quantitative data include:

Weight
Commute time to work
Outdoor temperature

All of these are measured in numbers. It makes sense to find, for instance, averages of these. So you can do numerical operations with them.

It's important to note that data is displayed differently for qualitative data than with quantitative data. Statistical operations depending on the type of data that we have.

 TERM TO KNOW

Quantitative Data
Data whose values are numbers and it makes sense to do numerical operations.

3. Qualitative and Quantitative Data in Practice


Determine if each situation is qualitative or quantitative data.

Video Transcription
[MUSIC PLAYING] Here we have some examples to help you understand the differences between qualitative and quantitative data. So first, we have blood type.
That's going to be an example of qualitative data. It's a description. It's telling you something about yourself, but it's not something that can be added, or subtracted,
or used for arithmetic even.

On the other hand, number of kids is quantitative data. Or how about a phone number? So even though it's a number, it's still qualitative data because, really, who
would ever add or subtract those values? So what's an example of quantitative data? How about something like income? Income is quantitative data because, again,
it's a value that's giving us a quantity. It's telling us how much money you make, and that's a value you could add, subtract, and do the mean and other measures of
arithmetic.

[MUSIC PLAYING]

 SUMMARY

Data used in statistics falls under one of two broad classifications: categorical, which is called “qualitative,” or numerical, which is called “quantitative.”

Qualitative data branches out even further to either nominal, which means that the names are important, and ordinal, which means the order is important.

Numerical values must make sense to do numerical operations with them. They are treated differently when organizing graphical displays and applying statistics to
them.

Good luck!

Source: This work is adapted from Sophia author Jonathan Osters.

 TERMS TO KNOW

Nominal Data
Categorical data with qualities that cannot be ordered or ranked.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 7
Ordinal Data
Categorical data with qualities that can be ordered or ranked.

Qualitative (Categorical) Data


Data that describes. It can't be measured or used for arithmetic.

Quantitative (Numerical) Data


Data that is numerical. It can be measured and it can be used for arithmetic. .

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 8
Discrete vs. Continuous Data
by Sophia

 WHAT'S COVERED

This tutorial will discuss types of data by contrasting the following types of data:

1. Discrete Data
2. Continuous Data
3. Discrete and Continuous Data in Practice

1. Discrete Data
Now both of these are numerical or quantitative data, but discrete data can only take on certain values within a range. Examples of discrete data would be the number of
pets that someone has. Those can only take whole number values. You can't have half of a pet.

Rail cars on the train and shoe sizes--now you can have half size shoe sizes. But that's all you can have. You can't have quarter size shoe sizes, or eighth of size shoe
sizes, or 0.01 shoe sizes. You can't say that you're a size 9 and an eighth. So there are only certain values that shoe size can take. That makes it discrete.

 TERM TO KNOW

Discrete Data
Data that can only take so many different values.

2. Continuous Data
Now the difference between discrete and continuous is continuous data can take any value within a range. Some examples of data that are continuous are temperature,
commute time, and wait. With all of these examples, you can take on any value within a range. So for instance, suppose you're talking about daytime temperature.

The daytime temperature could be something between 50 and 80 degrees on a summer's day, and it takes on any value between those. Same with commute time. One
day it might take you 30 minutes and five seconds to get to work. The next day it might take you 32 minutes and 17 seconds.

And weight, one person might weigh 150.75 pounds, and one person might weigh 102.62 pounds. They can take on any value within a spectrum. As opposed to discrete
values can only take certain values within a spectrum.

 TERM TO KNOW

Continuous Data
Data that can take any value within an interval.

3. Discrete and Continuous Data in Practice


Determine if each situation is discrete or continuous.

Video Transcription
[MUSIC PLAYING] Now, let's take a look at a few examples and determine if a situation is discrete, or continuous. The time it takes to complete a race-- is this
discrete or continuous? The time to complete a race or any task is continuous data. Time can take on any value. You can measure the time it takes to finish a race in
hours, minutes, seconds, even fractions of a second.

The number of pairs of shoes you own-- discrete or continuous? This is discrete. You can't have half a pair. OK, I suppose if you lose a shoe, you can have half a pair.
But then, it's no longer a pair. Am I right? Whatever the case, your number of pairs of shoes is not any number within a certain range. Your number of shoes is a
specific whole number, which is therefore a discrete number.

The time it takes for a light bulb to burn out-- is this a discrete or continuous number? This would be continuous data. It could take any length of time for your light
bulb to burn out, from 0 seconds up to many years. How about the number of green chocolate candies in a bag? Is that discrete or continuous? If you said discrete,
you're correct. You typically would be dealing with only whole number values, unless the poor bag of candy is crushed.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 9
Barometric pressure-- is this discrete or continuous? You should have said that barometric pressure is continuous because it can take any value within a certain
range, usually somewhere around 30 inches hg.

 TRY IT

Determine if the following are discrete or continuous.

Is barometric pressure discrete or continuous? +

You should have said that barometric pressure is continuous because it can take any value within a certain range, usually somewhere around 30.

Is the number of pairs of shoes someone owns discrete or continuous? +

Discrete. You can't have half a pair--I suppose you can half a pair of shoes if you've lost one--but you can't have any number of pairs of shoes within a certain
range. Typically, it takes only whole number values.

Is the time for a light bulb to burn out discrete or continuous? +

That's continuous. It could take any length of time from zero seconds all the way to a couple of years.

Question: Number of green M&Ms in a bag?


Answer: Discrete. Typically, again, we're dealing only with whole number values.

 SUMMARY

Quantitative data can be broken down into two subcategories. It can be called continuous. It can take on a range of values, or if it can only take certain values, we
call it discrete. And every quantitative data measurement that we get is either going to be continuous or discrete. And the terms we used are continuous data,
which can take on any number in a range; and discrete data, which can only take on certain values. This tutorial also put discrete and continuous data in practice
to allow for some application!

Good Luck!

Source: This work is adapted from Sophia author Jonathan Osters.

 TERMS TO KNOW

Continuous Data
Data that can take any value within an interval.

Discrete Data
Data that can only take so many different values.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 10
Sampling
by Sophia

 WHAT'S COVERED

In this tutorial, you're going to learn all about sampling, focusing on:

1. Population and Census


2. Sample

1. Population and Census


Sampling always starts with a population. Population is the complete set of all the things that are being studied.

Typically, we use the population of the United States, the population of the world, or the population of a state to be the population that we wish to generalize our findings to
since examining all members of a population may not be feasible. This method, examining all members, is called a census. Hopefully, a group of people can represent the
population.

Since the group of people from the United States seems like too big of an example, a smaller example of billiard balls will be demonstrated. As you see in the image below,
the complete set of things in this particular example are the 15 billiard balls on a pool table.

With a group so small, it's possible to take all of them and define some attribute of them like color, or weight, or what have you--whether they're striped or solid, there are
lots of different ways that you could describe each pool ball. And it's easy enough just to take the entire population and examine all of them.

 TERMS TO KNOW

Census
Using the entire population to obtain data.

Population
The entire set of individuals from which to sample.

2. Sample

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 11
When you think about the United States example, you can see that it's not really always feasible. Suppose your population is a large group of people, much larger than 15
people. It's kind of a big group, and it might be hard to get answers from everybody.

What you might choose to do is take a small subset of those individuals and make a sample. In this case, perhaps seven of these many individuals in the population were
chosen. A sample is a subset of the population and you would obtain data from that subset and leave everyone else out.

From that sample, you would obtain your data and calculate your statistics. The idea is hopefully you would like the sample to be a small version of the population. A
microcosm of the population, such that when you calculate your statistics from the data we obtain from the sample, it's about the same as what we would have gotten if we
had measured the population directly. That's what we mean when we say that we want the sample to be a representative sample of the population.

There are certain ways that you can guarantee that a sample will be representative. One way is to take the entire population and put them in a hat.

Now again, this is a lot easier with billiard balls then it is with people. But imagine putting all the billiard balls into the hat.

Let’s say you shake up the hat, and take out a sample of five.

There are certain ways to guarantee that you won't get a representative population. Suppose I specifically cherry picked only solid colored billiard balls. Well, that wouldn't
be very representative of the population of 15.

 THINK ABOUT IT

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 12
Is it possible that when you take that hat and pull out five billiard balls that all five of them are solid? Sure, that's possible, it's just not all that likely. If you cherry pick,
that's not a good idea because you're getting something that's specifically not represented.
 TERMS TO KNOW

Sample/Sampling
A subset of the population. There are many ways to select a sample.

Representative Sample
A sample that accurately reflects the population.

 SUMMARY

A census is a way of collecting data that uses everybody. And a sample only uses some. To generalize the findings from the sample to the population at large, it
has to be representative of your population at large. Once again, the terms that we've described in this tutorial are population, census, the noun sample, and the
verb sampling, and the idea that a sample should be representative.

Good luck!

Source: this work is adapted from sophia author jonathan osters.

 TERMS TO KNOW

Census
Using the entire population to obtain data

Population
The entire set of individuals from which to sample

Representative Sample
A sample that accurately reflects the population

Sample/Sampling
A subset of the population. There are many ways to select a sample.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 13
Random & Probability Sampling
by Sophia

 WHAT'S COVERED

This tutorial covers random and probability sampling methods, focusing on:

1. Random Sample
2. Probability

1. Random Sample
The term “random” is used a lot in everyday speech, but what does it mean when it comes to statistics? In statistics, random refers to something that is unpredictable and
does not have a recognizable pattern.

With a random sample, every member of the population has the same chance of getting selected. This is the best way to get a representative sample. Recall that a
representative sample is when the population and the sample have the same set of relevant characters.

If you want a random sample, you would need to select participants in such a way that every member of that population has an equal chance of being selected for the
sample. This is also known as random selection.

You need to come up with a method to achieve a random sample, and you can do that with a probability sampling plan. This plan must be made first before a random
sample can be taken. You can also “weight” certain people so that they might be more likely to be selected for the sample, too.

IN CONTEXT
What does a random sample look like in context? Suppose there are 15 billiard balls from a pool table:

You place them all in a hat, and you shake the hat, and voila, here's a sample of five.

Shake #1

We got ball numbers 1, 5, 7, 10, and 14.

Suppose you place the billiard balls back in the hat and shake the hat for a second time.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 14
Shake #2

This is another sample of five and is not that different than the previous example. If you conducted the same hat trick over and over again, they would all have an
equal chance of being pulled.

Let's shake the hat for a third time.

Shake #3

What happened here was we got balls 9, 11, 12, 13, and 14--all of which happened to be striped billiard balls. No solids. If you only had access to this information,
you might be led to believe that all the balls in the hat were striped, which wouldn't be the case.

This may seem odd, but it can certainly happen even though you selected these randomly--you did a probability sampling plan. The reason being, this sample of
five is just as likely as any other sample of five to be chosen.

 TERMS TO KNOW

Random Sample
A sample that has been selected in a manner where every member of the population has some predetermined chance of being selected for the sample.

Random Selection
The method of obtaining a random sample.

Probability Sampling Plan


The way to collect a random sample that guarantees a certain likelihood for each member of the population to be selected.
Might you get something that's unrepresentative? Yes. But the vast majority of the time, it will be representative.

 SUMMARY

The best method for selecting a sample that's representative is a random sample and a probability sampling plan. Now, this won't always get you a representative
sample. But often, you will get one when you do random samples.

Good luck!

Source: This work is adapted from sophia author jonathan osters.

 TERMS TO KNOW

Probability Sampling Plan


The way to collect a random sample that guarantees a certain likelihood for each member of the population to be selected

Random Sample
A sample that has been selected in a manner where every member of the population has some predetermined chance of being selected for the sample

Random Selection
The method of obtaining a random sample

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 15
Simple Random and Systematic Random Sampling
by Sophia

 WHAT'S COVERED

This lesson will explain how to ensure everyone in the population has an equal chance of participating in a sample, specifically focusing on:

1. Simple Random Sample


a. Random Number Generator
b. Random Number Table
2. Systematic Random Samples

1. Simple Random Sample


A Simple Random Sample (SRS) is a sampling method that not only ensures that everyone in the population has an equal chance of being in the sample, but also that
every sample is equally likely to be the sample that's being selected.

If you’ve ever experienced a raffle situation, you’ve experienced a simple random sample. What generally happens at these events is that someone removes tickets from
the raffle puts them into a bucket.

The tickets are mixed up in the bucket, and one ticket is pulled out. The owner of that ticket usually wins some kind of fantastic prize. Now, being in a simple random
sample is pretty much the same thing. The only difference is that instead of winning the prize, you get to be part of the sample and that's your prize.

IN CONTEXT
Suppose you take billiard balls from a pool table and put those all into a hat.

Next, shake it up, and pour out five billiard balls. Do this for two shakes.

Shake #1 Shake #2

You may have noticed that the solid, yellow “1” ball was in both of these first two examples. However, it doesn't mean it's any more likely to be selected than any
of the other balls. It's the same likelihood. Any sample of five, the first sample or second sample of five, were equally likely samples of five

Let's shake the hat for a third time.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 16
Shake #3

Now, notice, all five of these were striped billiard balls, not one solid ball in the bunch. Is that unusual? Sure, it's kind of unusual to happen. Unusual samples
have an equal likelihood to happen too. Just because they're strange and don't happen very often doesn't mean they can't happen. In fact, they have the same
likelihood as any other selection of five.

Therefore, knowing how to take a Simple Random Sample, abbreviated SRS, is important because most inferences about the population that we do assume that we
collected data in this way. So names in a hat are fine. In our case, raffle tickets in a bucket, or billiard balls in a hat...that's all fine.

 TERM TO KNOW

Simple Random Sample (SRS)


A method of selection that guarantees that every sample of a certain size has an equal chance of being the selected sample

1a. Random Number Generator


However, what about the situations where we don’t have the manpower to pull numbers or names from a hat? There are two other ways to take a simple random sample.
One way is using a random number generator and the other is a random number table. First, we are going to discuss the random number generator.

 EXAMPLE Suppose that we want to take a sample of 100 individuals from a population of 2,000 people. Below you will see some of those individuals lined up,
and you can imagine that individuals 10 through 1,995 are somewhere in the middle. Each is assigned a unique number so no one can have the same number as
anybody else.

Using technology such as a website, you can search "random number generator" on the internet, and websites will come up. Or, you can use a calculator. This particular
model of a calculator is the Texas Instruments calculator:

“RandInt” indicates random integer”--an integer is a whole number-- from 0 to 1. And so it picks either 0 or 1. When you put in the third number, it's asking how many of
them do you want? In this case, you entered five. Now, you don't want numbers between 0 and 1 in this case, and you don't want five of them. You want numbers between
0 and 2,000., and you want 100 of them. Now, why was 150 written when you only want 100 numbers?

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 17
You can’t select one person twice, so repeats must be ignored. It's incredibly likely that if you had just written 100 instead of 150, there would have been at least one repeat
in the bunch.

Finally, you're going to select the individuals that correspond to those first 100 different numbers that were picked.

So, person number 8, and the person that corresponds to 1,119, and the person who corresponds to 1,996 are a few that are chosen. Now, notice that the person
corresponding to 8 was chosen again--you can see that it’s listed twice in the list. You're not going to select that person twice because they've already been selected once,
so they are crossed out. This is the reason 150 numbers were created, so you have room to cross repeats out.

 TERM TO KNOW

Random Number Generator


A method of collecting a sample that utilizes technology to select random numbers corresponding to individuals in the population.

1b. Random Number Table


Using a random number table is basically the same idea, though it is a little bit more cumbersome. For starters, it’s generally used if no technology is available. You will
soon notice this is a long process and more time-consuming than using a random number generator. A random number generator typically goes faster.
Each individual is assigned a unique number, just like the random number generator; however, each member's number must have the same number of digits.

The same method as the random number generator cannot be used, because the number 2,000 has four digits, and the number 1 only has one digit. All of these must
have the same number of digits, so instead of 1, it's 0001. Instead of 2, it's 0002, and so forth, all the way up to 2,000. A table of random digits can be found in a textbook
or online. Four numbers will be selected at a time because each individual has four numbers.

 EXAMPLE Suppose the first four numbers found were 1-9-2-2. That corresponds to someone in the list. There is someone who is 1,922 so that individual will be
selected for the sample. It’s circled in green below since a person corresponds to that number. The next number found is 3-9-5-0. No one on the list that
corresponds to the number 3,950, so it is ignored. The next number, 3-4-0-5, does not correspond to an individual either, so that is ignored as well.

You'll notice that all numbers circled in red are numbers that are unassigned in our list. This is going to make this a very cumbersome process. It will go for a while until 100
individuals are obtained. Will this work? It will work, but it might take a very long time.

One of the numbers circled in green is 0001. This is the very first person on the list, and it just happens that person 0001 will be among the sample. This individual will be
selected along with everyone else whose four-digit number was selected.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 18
 TERM TO KNOW

Random Number Table


A method of collecting a sample to select random numbers corresponding to individuals in the population. Each is assigned a number, which is then selected from
the table.

2. Systematic Random Sampling


There is one thing to know about systematic sampling right off the bat: it is not inherently random. You have to be very careful about this. A systematic random sample
involves assigning a value, "k," to individuals within a population. Then, you state that every “k"th individual is chosen, similar to elementary school when you counted off
by 3’s to create teams.

The value of "k" can be anything. You could choose every second individual, in which case all the green people are in, and all these black stick figures are out. Or, you
could do every third person, where one person is in and then skip two; then the fourth person is in and skip two. Or, you could go every fourth person.

Often people prefer systematic samples to simple random samples because systematic samples are so much easier to take. It's easier than getting a whole list of people
and assigning everyone a number or putting all the people's names in a hat. It's easier to take every fifth person or whatever you decide "k" should be.

 HINT

The nice thing about a systematic sample is that it can be tailored to fit your sample size. If you wanted a sample of 25 from 500 individuals, you could sample every
20th person since 500 divided by 25 equals 20. So you would obtain your sample of 25 by sampling every 20th person.

IN CONTEXT
Suppose that you have 20 students in a class, and they're in rows, assigned to their desks randomly. If that were the case, you could count off every fourth
student and have five students go up to the chalkboard to do a homework problem on the chalkboard.

1 2 3 ✘ 5

6 7 ✘ 9 10

11 ✘ 13 14 15

✘ 17 18 19 ✘

So, person one, two, and three don't have to do it. Person number four heads up to the chalkboard to work on a problem. Five, six, and seven don't have to do it,
but number eight does. You can see the checkmarks to indicate the pattern and who needs to go up to the chalkboard.

What if they were alphabetized instead of randomly assigned?

Adamson
Abbott Acosta Adams Adler

Frye
Anderson Bueller Grey Jones

Morris
McClurg Peterson Pickett Rooney

Ruck Ward
Sara Sheen Stein
✘ ✘

By selecting say, Adamson, you automatically know who all the rest of the people are going to be. Since Adler is right next to Adamson, you know that Adler won't
get chosen. Nor will Anderson or Bueller, but Frye will.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 19
If these students were randomly assigned to the seats, picking Adamson would not predetermine who all the other people were going to be selected for the
sample, but having them alphabetized impacts the random selection process.

 TERM TO KNOW

Systematic Random Sample


A sampling method where every "k"th individual is selected for the sample (e.g. every 2nd, 4th, 20th individual).

 SUMMARY

A simple random sample is the ideal sampling method if your goal is to obtain a representative sample. Sometimes, with big populations, it's not feasible to assign
everyone a number or put everything into a hat so other sampling methods may be used. The random number generator is typically used with a calculator and is a
fast way to calculate random “integers” without needing to assign same-number digits to each individual. The random number table is a more time-consuming
method and generally used when technology is not available. A systematic sample can be similarly valid, and it is much easier to perform. It involves taking every
"k"-th individual--however, the population must be randomly sorted before the systematic selection. Otherwise, it won't be considered random.

Good luck!

Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS.

 TERMS TO KNOW

Random Number Generator


A method of collecting a sample that utilizes technology to select random numbers corresponding to individuals in the population

Random Number Table


A method of collecting a sample to select random numbers corresponding to individuals in the population. Each individual is assigned a number, which are then selected
from the table.

Simple Random Sample


A method of selection that guarantees that every sample of a certain size has an equal chance of being the selected sample

Systematic Random Sample


A sampling method where every "k"th individual is selected for the sample (e.g. every 2nd, 4th, 20th individual)

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 20
Stratified Random and Cluster Sampling
by Sophia

 WHAT'S COVERED

This tutorial will cover the topic of stratified random sampling, which is a random sampling procedure that subdivides the population into groups. In addition, we will
introduce cluster samples. This lesson will focus on:

1. Stratified Random Samples


2. Cluster Samples
3. Real-World Comparison

1. Stratified Random Samples


Suppose a high school has just adopted a new, healthy lunch provider, and they would like to solicit student opinion on the healthy lunch options. The school has a total of
420 students: 100 freshmen, 110 sophomores, 120 juniors, and 90 seniors.

How would a simple random sample look?

For a simple random sample of 42 students, think of ways that 42 students could be chosen, each having an equal chance of being selected. First, assign each student a
unique number 1 to 420 (total number of students). Once this is done, you could:

Use a random number generator to select 42 numbers, ignoring repeats. The students who corresponded to those numbers will be surveyed about the school's new,
healthy options.
Put the 420 student names in a hat and draw out 42.

Now, is there a way that the study might improve and guarantee an accurate cross-section of students between the grades? After all, freshman might feel differently about
the healthy options than seniors so it will be important to have individuals from each grade weigh in on the lunch options.

This can be done with a stratified random sample. Stratified random sampling is a method where the population is subdivided into groups called strata. Strata are groups
with homogeneous characteristic(s). They are separated by the characteristic that we think might affect the overall sample. This is to avoid having too many of the sample
having this one characteristic that may affect the sample.

In the above example, it would look something like this: since 42 is 10% of the school's population, your survey should be 10% of each grade.

10% of the freshmen class of 100 is 10, so you would want to randomly select ten individuals from the freshman class to participate.
10% of the sophomore class of 110 is 11, so you would want to randomly select 11 individuals from the sophomore class to participate.
10% of the junior class of 120 is 12, so you would want to randomly select 12 individuals from the junior class to participate.
10% of the senior class of 90 is 9, so you would want to randomly select nine individuals from the senior class to participate.

Once the groups are in place, a simple random sample is carried out within each stratum, like putting names in a hat or assigning everyone a unique number and randomly
selecting numbers. You can have as many strata as you please, but they must be roughly homogeneous.

Video Transcription
[MUSIC PLAYING] Pretend you've subdivided billiard balls into low, middle, and high numbers. To take a stratified random sample of the 15, this is what you do. Put
all the low numbered balls in hat one. Put all the middle numbered balls in hat two. And finally, put all the high numbered balls in hat three.

At that point, you'd randomly select two from each hat. The result would give you a stratified random sample of six billiard balls. You're guaranteed to have exactly two
low numbers, exactly two middle numbers, and exactly two high numbers.

 TERM TO KNOW

Stratified Random Sample


A random sampling method where individuals are separated into homogeneous groups, then simple random samples are taken within each group.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 21
Stratum/Strata
The homogeneous groups in a stratified random sample. All individuals in each stratum have something in common, and we would like to see how that affects the
outcome of the sample.

2. Cluster Samples
When using a cluster sample, the population is divided into groups. These groups are called clusters. It’s important to note that these groups are natural groupings. They
don't necessarily have anything in common, other than say, geography, typically. Therefore, we're going to take a random sample of clusters instead of a random sample of
individuals.

Each individual in the cluster is going to be part of the sample if we select that cluster. So unlike the groups in a stratified random sample, the groups in a cluster sample
aren't based on a characteristic or variable. The individuals in the cluster just happen to be near each other.

IN CONTEXT
Suppose you work at a potato chip company and it’s your job to implement some quality control in the manufacturing department. Maybe you stand at the start of
the assembly line and take a simple random sample of individual chips. That would work just fine.

However, it might be easier for you to sample some bags of chips. The bags of chips are clusters. You would then take a bag of chips off the assembly line and
sample every chip in that bag for quality control. That’s cluster sampling.

Similar to every sampling method, cluster sampling has pros and cons.

Advantages and Disadvantages for Cluster Sampling

Easier than a simple random sample, and often it doesn't cost as much
Advantages
Typically gives similar results because the clusters are fairly heterogeneous

Risk that clusters are NOT heterogeneous--perhaps they do have some characteristic other than just being geographically different from each other that
Disadvantages
might affect the sample's findings.

 TERMS TO KNOW

Cluster Sample
A sampling method where the population is separated into groups, typically geographically, and a random selection of clusters is made. Each individual in the
cluster becomes part of the sample.

Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being together in one place, making the individuals easier to sample together.

3. Real-World Comparison
Suppose a landlord of an apartment complex wants to know whether a new carpet he's considering is appropriate for all the apartments in the building. Each of the four
floors has eight apartments.

 THINK ABOUT IT

What would a simple random sample look like? How might a cluster sample be different from a stratified random sample?

Simple Random Sample: He could randomly select eight apartments from the building.
Stratified Random Sample: He could randomly select two apartments per floor.
Cluster Sample: He could take a spinner like the one shown below and spin it.

Suppose it landed on three. That means that every apartment on the third floor would receive carpeting. He doesn't have to have the carpet installers going to all these
different rooms on all these different floors. He can simply instruct everyone to go up to the third floor and install carpet in every room on that floor, which would be far

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 22
easier for him and just as cost-effective.
But what if all the floors were NOT heterogeneous? What if apartments on the third floor allowed pets? The carpet might not hold up as well. That’s one of the
disadvantages of cluster sampling in action. But typically, the clusters are fairly representative and very similar to a simple random sample.

 SUMMARY

In a stratified random sample, the population is broken down into homogeneous groups called "strata." The reason for this is to separate an otherwise
homogeneous group that exhibits characteristics that may misrepresent the population. The idea is to force them into groups and then take a simple random
sample within each of the strata. Cluster sampling, on the other hand, is done by taking naturally-occurring--typically geographically--similar groups and taking a
simple random sample of the clusters. Then, each member of the cluster becomes part of the sample. A couple of advantages of cluster samples are that they are
more cost effective, and usually achieve the same results as a simple random sample. The disadvantage is that sometimes the cluster may not be heterogeneous,
as seen in the landlord example with pets allowed on carpet.

Good luck!

 TERMS TO KNOW

Cluster Sample
A sampling method where the population is separated into groups, typically geographically, and a random selection of clusters is made. Each individual in the cluster
becomes part of the sample.

Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being together in one place, making the individuals easier to sample together.

Stratified Random Sample


A random sampling method where individuals are separated into homogenous groups, then simple random samples are taken within each group.

Stratum/Strata
The homogenous groups in a stratified random sample. All individuals in each stratum have something in common, and we would like to see how that affects the
outcome of the sample.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 23
Multi-Stage Sampling
by Sophia

 WHAT'S COVERED

This tutorial will introduce multi-stage sampling, focusing specifically on:

1. Comparing Sampling Methods


2. Multi-Stage Sampling

1. Comparing Sampling Methods


Suppose that you wanted to sample from the entire United States as a whole.

Can you perform a simple random sample (SRS)?

You'd have to somehow account for every person in the United States, and maybe assign them a number, and pull numbers out of a hat, or use some kind of random
sampling procedure. This would be too difficult to assign to everyone.

Can you perform a stratified random sample?

Strata, in this case, are still too big. You might take a few people from Maine, and a few people from Minnesota, and a few people from North Dakota, etc., and it would still
be too large. Plus, it really wouldn't be cost effective, commuting to all these different places.

Can you perform a cluster sample of states?

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 24
If you identified states as clusters, you would randomly select some of the clusters and then sample everyone within that cluster. You'd be sampling entire states. For
example, everyone in North Carolina would be in the sample if you select that state as a cluster, which simply isn't feasible.

Therefore, none of those really make any sense. The way out of the box here is a multi-stage design.

2. Multi-Stage Sampling
Multi-stage sampling is a common sampling procedure utilized when the population is very, very large. With multi-stage sampling, you continue zooming in from larger
areas to smaller and smaller areas until you can find a small enough sample of the people you need.

To perform a multi-stage sampling, first select clusters, then take a simple random sample from each cluster.

Let's take a look at an example:

Video Transcription
[MUSIC PLAYING] Suppose you want to sample the United States as a whole. Because of geographic simplicity, states make the most sense as clusters. If every
state needs to be represented, a stratified random sample should be performed. However, it's not realistic or feasible to sample everyone within each state. So in this
instance, you can randomly select five states to make up the clusters for your multi-stage sample. Of these five states, you pick one to begin the process.

Let's say you start with Minnesota. And because it's equally unrealistic to sample everyone in a state, you continue to narrow down your population with a random
selection of counties. You once again select five. If you were able to sample everyone in these counties, you can stop. But if you still need a smaller sample size,
randomly choose just one, such as Carver County. Then you can randomly select three towns within that county.

Again, if those are small enough units, you can stop. However, if the sample size is still too large, continue to narrow it down by selecting just one town, like Chaska.
Within Chaska, for example, you can sample some neighborhoods.

Typically, by the time you get to the neighborhood level, it's easy enough to walk around and get almost everybody within that neighborhood. This method of drilling
down from state to county to town to neighborhood would give you a multi-stage sample of your first cluster, Minnesota. Then it's on to the next cluster, where you
would repeat the process with the remaining four to achieve a multi-stage sampling of the United States.

 STEP BY STEP

Step 1: States
When sampling the United States as a whole, states make the most sense as clusters because of geographic simplicity. It’s not realistic or feasible to sample everyone
within a state, so randomly select just five states: California, Tennessee, Minnesota, Massachusetts, and Oklahoma. Pick one state and start the process.
Step 2: Counties
It is equally unrealistic to sample everyone in Minnesota, so you can narrow your sample by randomly select counties. Perhaps you select Carver County, Marshall
County, and maybe a few other counties. If that's a small enough basis for you to get everyone within the county, then you can stop.
Step 3: Towns
If you need yet a smaller sample size, you can choose just one county, like Carver County, and sample towns within that county. Perhaps you randomly select three of
those towns: Chanhassen, Waconia, and Chaska. If those are small enough units, then you can stop.
Step 4: Neighborhoods
However, if the sample size is still too large, you can continue to narrow it down. Within Chaska, for example, you can sample some neighborhoods. Typically by the
time you get to neighborhoods within a town, it's easy enough to walk around the neighborhood and get almost everybody within that neighborhood.
Now you can move onto the next cluster where you would repeat this process with the remaining four states.

 TERM TO KNOW

Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random sampling, and simple random sampling. It "zooms in" on smaller areas to
sample so that sampling becomes more feasible.

 SUMMARY

Multi-stage sampling is used when the population is so big and the groups, strata or clusters so large that it makes more sense to zoom in and take small groups.
You begin with certain clusters, and then you sample within those clusters instead of taking the full cluster. Therefore, multi-stage sampling combines elements of
cluster sampling, stratified designs, and simple random designs, which were contrasted within this tutorial, though you may recall, none of these were feasible
when attempting the sample of the United States.

Good luck!

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 25
Source: SOURCE: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS. MN MAP:
HTTPS://EN.WIKIPEDIA.ORG/WIKI/LIST_OF_COUNTIES_IN_... CARVER COUNTY: HTTPS://EN.WIKIPEDIA.ORG/WIKI/LIST_OF_COUNTIES_IN_...

 TERMS TO KNOW

Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random sampling, and simple random sampling. It "zooms in" on smaller areas to sample so
that sampling becomes more feasible.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 26
Observational Studies and Experiments
by Sophia

 WHAT'S COVERED

This tutorial will explore observational studies and how they are conducted. We will also cover experiments, which are a little different than observational studies,
through the exploration of:

1. Observational Studies
2. Types of Observational Studies
3. Experiments
4. Experiments vs. Observational Studies

1. Observational Studies
An observational study is a type of study where the researcher can observe but does not administer any treatment. Therefore, whatever would normally happen, the
researcher has to allow it to happen.

Researchers can't change anything about the people or subjects they are studying. The researcher can record the variables of interest, but again, can't affect the study.
People have to be allowed to do whatever it is they were going to do without interruption.

 TERM TO KNOW

Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or outcomes in any way.

2. Types of Observational Studies


There are two types of observational studies:

Retrospective Study: Researchers look to the past to see what has already happened; also known as a case-control study.

 EXAMPLE Consider observing people who are sick--those are called the cases--versus people that aren't sick, which are the controls. Then, you look back to
see what similarities the cases have in common and what similarities the controls have in common.

Prospective Study: Researchers select individuals to participate and record what happens as it happens; also known as a longitudinal study.

 EXAMPLE Individuals are engaging in activities like smoking or jogging. You record what happens as it happens, as opposed to trying to look back and figure it
out.

IN CONTEXT
The year is 1929 and a cancer doctor has a suspicion that smoking may cause cancer. His cancer patients become his subjects, or participants, in his study. He
asks his subjects, “Did you happen to smoke before you got cancer?” What he found was an overwhelming majority of his cancer patients did, in fact, smoke.
Therefore, this doctor was the very first person to suggest a link between smoking and cancer.

That inspired some new studies, one of which began in 1934. It dealt with several thousand doctors, so it was a physician’s smoking study. The reason doctors
were chosen is that doctors are usually very diligent about following protocols, meaning that those who smoked would likely continue to smoke, and those who

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 27
didn't smoke would likely continue not smoking. Also, doctors typically wouldn't drop out of a study. Notice in the image below, how some of these physicians
smoked, and some of them did not.

They did the study, and some of the doctors got cancer. Now, not every doctor who smoked ended up getting cancer, and not every person who got cancer was a
smoker. However, what they found was that the vast majority of the time, it was the doctors who smoked that got cancer.

This study was conducted over a long period of time--a 20-year study. At its conclusion, this was the most convincing evidence that smoking had an effect on
cancer. This was an example of a prospective study because it started with the doctors and followed them through to 1954.

It is important to note, however, that neither of these types of studies, prospective or a retrospective, can actually prove a cause-and-effect relationship. The only thing that
can prove a cause-and-effect relationship between two variables is an experiment.

 TERMS TO KNOW

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they became the way they are in the present.

Prospective Study
A study that begins by selecting participants, tracking them, and keeping data on the subjects as they go into the future.

Subjects/Participants
The people or things being examined in an observational study.

3. Experiments
An experiment is a different type of study than an observational study. The differences will be covered in detail shortly, but essentially, the researchers are allowed to
impose treatments on the participants. Treatments are administered and response to those treatments is measured. Because the researchers are the ones implementing
the treatments and measuring the response, a cause-and-effect relationship between variables can be determined.

When discussing experiments, there is some very common terminology that you should be aware of. For example, as mentioned in the section above, subjects and
participants are used interchangeably and describe people involved in an experiment. If animals or things are used in an experiment, they are referred to as experimental
units. While it may seem a bit impersonal, it is universal terminology in the field of experiments.

 TERMS TO KNOW

Experiment
A type of study where researchers impose treatments on the participants or experimental units.

Experimental Unit
An animal or thing involved in an experiment.

4. Experiments vs. Observational Studies

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 28
In an observational study, the researcher observes the individuals but does not administer treatment. The researcher simply has to allow what would normally happen to
happen. Again, they can record variables of interest, but not affect it in any way. The researcher is not necessarily an active participant in the study, other than observing
and recording.

An experiment, on the other hand, is far more active on the part of the researcher. The researcher is creating the differences between the two groups, then determining
whether or not there is a cause-and-effect relationship.

If you have a study that you'd like to do, but you can't perform it due to ethical or practical concerns, or it takes too much time or money, you can avoid those concerns or
circumvent them by doing an observational study.

 THINK ABOUT IT

When trying to determine if cigarette smoking causes cancer, several observational studies have been conducted, but never a true experiment. Why would that be?

Well, it would be unethical to break people into groups and administer cigarettes to a group of people when trying to determine if it causes terminal illness. The same
applies to alcohol consumption.

 BIG IDEA

There are certain instances in which an observational study will be preferred over an experiment due to factors like time, money, and privacy, where it is unlikely people
will divulge that type of information

 SUMMARY

An observational study is a type of study where the researcher can observe but not influence the behavior of the participants, or subjects. A retrospective study
involves looking back at behavior, while a prospective study involves gathering your participants and following them along as they live their lives. An observational
study, though, cannot prove a cause-and-effect relationship

Conversely, in an experiment, a researcher can directly influence the subjects by applying treatments. Because the researchers are the ones implementing the
treatments and measuring the response, a cause-and-effect relationship between variables can be determined. Terminology such as subjects and participants is
important to know since it identifies individuals directly involved in the experiment. Animals may be directly involved in an experiment, but they are referred to as
experimental units rather than subjects or participants.

Sometimes an experiment may be unethical, expensive, or too lengthy. In those cases, observational studies may be used, which allow a researcher to study
occurrences in a natural setting without administering treatment of any kind.

Good luck!

 TERMS TO KNOW

Experiment
A type of study where researchers impose treatments on the participants or experimental units.

Experimental Unit
An animal or thing involved in an experiment.

Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or outcomes in any way.

Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they go into the future.

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they became the way they are in the present.

Subjects/Participants
The people or things being examined in an observational study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 29
Prospective and Retrospective Studies
by Sophia

 WHAT'S COVERED

This tutorial will explore observational studies and how they are conducted. We will also cover experiments, which are a little different than observational studies,
through the exploration of:

1. Observational Studies
2. Types of Observational Studies
a. Prospective Study
b. Retrospective Study

1. Observational Studies
An observational study is a type of study where the researcher can observe but does not administer any treatment. Therefore, whatever would normally happen, the
researcher has to allow it to happen.

Researchers can't change anything about the people or subjects they are studying. The researcher can record the variables of interest, but again, can't affect the study.
People have to be allowed to do whatever it is they were going to do without interruption.

 TERM TO KNOW

Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or outcomes in any way.

2. Types of Observational Studies


There are two types of observational studies:

2a. Retrospective Study


Retrospective Study, also known as a case-control study. Researchers look to the past to see what has already happened.

It can be similar to a matched-pair design in an experiment, but in this case, the researchers are not giving a treatment or doing anything to affect the people.

 EXAMPLE In a study, suppose you take a pair of participants, who are similar across most variables except for one major difference -- one participant has a
disease, "the case", and one participant who does not have a disease, "the control". Because the participants are so similar, you are focusing on just that disease
and seeing how it affects the participants or what causes the disease.

This is considered retrospective because it looks in the past. You ask the participants to recall past events or use information about their past to determine what risk
factors there are for the disease.
 TERM TO KNOW

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they became the way they are in the present.

2b. Prospective Study


Prospective Study, also known as a longitudinal study, occurs over a long period of time. It observes the same set of people and follows the same variables over that
chunk of time. It can be as many as several decades. While this type of study is not quick to do, it provides a lot of data and many different researchers can use this
information in a variety of ways.

 EXAMPLE The Framingham Heart Study started in 1948 and is still going on today. 5,209 healthy adults from Framingham enrolled in this study. Researchers
collected a variety of information about the subjects, including social networks, eating habits, exercise habits, and several markers for heart health.

Over a thousand different research papers have been written using this information. Some of these papers have proven that obesity and smoking increase the risk
of heart failure. Other papers look at how the social networks tie to obesity risks.
 TERM TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 30
Prospective Study
A study that begins by selecting participants, tracking them, and keeping data on the subjects as they go into the future.

Subjects/Participants
The people or things being examined in an observational study.

 SUMMARY

An observational study is a type of study where the researcher can observe but not influence the behavior of the participants, or subjects. A retrospective study
involves looking back at behavior, while a prospective study involves gathering your participants and following them along as they live their lives. An observational
study, though, cannot prove a cause-and-effect relationship

Good luck!

 TERMS TO KNOW

Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or outcomes in any way.

Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they go into the future.

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they became the way they are in the present.

Subjects/Participants
The people or things being examined in an observational study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 31
Experimental Design
by Sophia

 WHAT'S COVERED

In this tutorial, you're going to learn about the principles of experimental design.

1. Components of Experimental Design


a. Control
b. Randomization
c. Replication

1. Components of Experimental Design


Experimental design refers to how an experiment is carried out. Many experimental designs include a control group and a treatment group to compare effects of treatment
(exercise, drug, video watching, etc.). You can have a good design of an experiment or a poor design of an experiment.

Good experimental design will have these three components:

1. Control
2. Randomization
3. Replication

 TERMS TO KNOW

Experimental Design
The way in which an experiment is carried out. A good design has key elements of randomization, replication, and control.

Treatment
Something the researchers administer to the subjects or experimental units.

1a. Control
Control means holding everything else besides what you're trying to measure constant. The purpose is to determine whether or not your treatment is effective. In other
words, if there is an observable difference between groups, is it due to the treatments or due to a confounding variable? It is important to control all other variables to help
limit confounding.
One common way to control an experiment is with a control group. A control group is a set of samples that do not receive the treatment under consideration. For instance,
if you were studying a new cancer treatment, a control group might get the standard cancer treatment care, while the treatment group receives the new drug or treatment
being evaluated. In this case, the control group allows researchers to measure the effectiveness of the treatment against a group that is otherwise similar.

Source: This work is adapted from Sophia author Jonathan Osters.

Video Transcription
[MUSIC PLAYING] Hello. Let's take a look at a real-life example using experimental design. Suppose a farmer wants to try a new fertilizer in the fields. The three
components of experimental design can be used to determine if the new fertilizer is better than the old one. Here's how it would work.

The first thing the farmer would do is determine the control by selecting 10 fields with similar soil nutrients, sunlight, and water. These are all variables that could
affect the crop growth. The farmer would then apply the old fertilizer to five fields and the new fertilizer to the other five. By keeping the control elements consistent
across the 10 fields, the differences between them can be isolated and attributed to either the old or the new fertilizer.

Next, the farmer takes randomization into account by randomly assigning which five fields will get the new fertilizer. While the fields selected were as similar as
possible, there may be an unknown variable that was not accounted for. Perhaps some fields had moles underground. And that would affect how the crops grow.

By randomly assigning treatments, the farmer should get some fields with moles using the new fertilizer and some fields with moles using the old fertilizer.
Randomization smooths out those effects that unknown variables might bring into the equation.

Lastly, the farmer understands the significance of repeated results rather than a one-off result. Say the farmer was only able to find two fields similar to each other
and randomly assigned one for the new fertilizer and one for the old. It is possible in that case that the field with the old fertilizer does very well just by random
chance. This would make it seem like the new fertilizer is not effective when perhaps it is.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 32
Or the opposite could happen where it seems like the new fertilizer is effective when it's not. So it would always be better to randomly assign 10 fields as the farmer is
more likely to find valid trends among 10 fields than two. Thanks for watching. And see you next time.

IN CONTEXT
Suppose you are a farmer and you want to try a new fertilizer in your field. One thing you could do is choose ten fields with similar soil nutrients, sunlight, and
water--all variables that could affect the crop growth.

You could then apply the old fertilizer to five fields and the new fertilizer to the other five. By keeping all the other variables--soil nutrients, sunlight, water--
consistent, the differences between the fields can be isolated and attributed to the old fertilizer or the new fertilizer.

Does the new fertilizer work? Is it effective? This is the idea behind controlling for all of these other variables.

 TERM TO KNOW

Control
The principle of experimental design that requires that other variables which may confound the experiment be held constant between the treatment groups so that
any differences in the groups can be attributed to the different treatments.

Control Group
A group included in an experiment that do not receive the treatment under consideration and against which other experimental results can be compared and
validated.

1b. Randomization
The second big idea of experimental design is randomization. The treatments must be assigned to the subject using a random process, otherwise known as
"randomization." The purpose of random assignment is to try and filter out all the other sources of variation that you couldn't anticipate to control for.

 EXAMPLE Referring to the farmer example, even though you made the fields as similar as possible with respect to water, sunlight, and soil, it's possible that
there is a variable that you didn't think to control for. Perhaps some fields had moles under the ground, and that would affect how the crops grow. How would you
know to control for moles?

By randomly assigning treatments to the fields, you can hopefully get some fields with moles in fields with both the new and old fertilizer. Randomization smooths
out those effects that other variables might bring into the equation.

 HINT

Randomizing also helps avoid bias, because you can’t be tempted to assign treatments to the experimental units you think might give favorable outcomes.
Randomization in an experiment does not really achieve the same purpose as a random selection in a sample. When you do a simple random sample, the idea is to get a
sample that's representative of the population. In an experiment, the purpose of randomly assigning individuals to groups is to filter out unknown sources of variation. The
assignment in an experiment, however, is fairly similar to the way you would randomly select in a sample.

 TERM TO KNOW

Randomization
The principle of experimental design that requires that the subjects/experimental units be assigned to groups using some random process. This ensures that the
two groups are roughly equal prior to assigning treatments.

1c. Replication
Replication is the last key idea in experimental design, which basically states that a bigger sample is better. Repeating the experiment on multiple subjects or experimental
units is a better idea than doing a few. Why is that?
A larger size of the experiment means it's more likely that you can find trends that perhaps you wouldn't have found in a smaller experiment. The more you replicate, and
the more experimental units you can get into your experiment, the more likely it is that you're going to find the true trends that arise, rather than some freak anomaly.

 THINK ABOUT IT

What if the farmer could have just found two fields that were similar to each other, instead of 10 fields, and randomly assigned one to get the new fertilizer and one to
get the old. Isn't it possible in that case that maybe the field with the old fertilizer does very well just by random chance?

This would make it seem like the new fertilizer is not effective when perhaps it is. Or the opposite could happen, where it seems like the fertilizer is effective when it's
not. It would be better to randomly assign five plots, as opposed to just two, as it is more likely that the farmer is going to find trends among those five plots that are
more valid.
 TERM TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 33
Replication
Repeating the experiment on multiple subjects/experimental units. This principle of experimental design that states that a larger experiment with more
subjects/experimental units will allow us to more clearly see differences between the treatments.

 SUMMARY

The components of an experimental design--that is, a well-designed experiment--are control, randomization, and replication. Control helps to isolate the effects of
the treatments, randomization helps to make the groups as similar as possible and helps to avoid bias, and replication helps you to see the differences that might
not have been evident if you had used a small sample. Treatments, again, are the things that the researchers administer to the subjects or experimental units.

Good luck!

 TERMS TO KNOW

Control
The principle of experimental design that requires that other variables which may confound the experiment be held constant between the treatment groups, so that any
differences in the groups can be attributed to the different treatments.

Experimental Design
The way in which an experiment is carried out. A good design has key elements of randomization, replication, and control.

Randomization
The principle of experimental design that requires that the subjects/experimental units be assigned to groups using some random process. This ensures that the two
groups are roughly equal prior to assigning treatments.

Replication
Repeating the experiment on multiple subjects/experimental units. This principle of experimental design that states that a larger experiment with more
subjects/experimental units will allow us to more clearly see differences between the treatments.

Treatment
Something the researchers administer to the subjects or experimental units.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 34
Randomized Block Design
by Sophia

 WHAT'S COVERED

This tutorial is going to teach you about a randomized block design. A randomized block design is a little bit different than other types of designs that we've studied
so this tutorial will focus on:

1. Randomized Design
2. Block Design vs. Randomized Design

1. Randomized Design
Randomized block design is a type of experiment where participants are first divided into homogenous groups. This means that they are the same across some variable of
interest, such as age, race, income, location, job, or gender.

Once participants are in their similar group, they are randomly assigned to treatment or control within that group.

An advantage is that it controls for variables that would otherwise be confounding. If we think that job has an effect, we can make sure that a proportion number of people
who have the same job are assigned a treatment and control group.

A disadvantage is that it can reduce the sample size of each group.

IN CONTEXT
Suppose you are a researcher and you want to identify whether a new acid reflux drug is more effective than the one that's currently available. You gather 500
volunteers with acid reflux, put the number one on 250 cards, and the number two on another 250, and place all the cards in a hat. You mix them up and have
people pull out numbers.

People who received a "1" receive a new drug, and those who selected "2" received the old drug. The image below would be your original plan, starting with all
these volunteers, men and women, and then you randomly assigned them to groups.

The problem is, what if men and women respond differently to the drug?

The better design is using a randomized block design, so you try something different. First, take your large group and break it into smaller subgroups of just men
and just women.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 35
The image above has nine men and 14 women; you had a lot more in the old design, but now you’re going to run the experiments essentially in parallel: one
experiment for men and one experiment for women. Now you’re going to take the men and randomly assign half of them to the treatment and half to the control.
You’re going to take half the women and assign them to the treatment and assign them to the control, which looks like this:

Men and women receiving the treatment are in purple, and the men and women receiving the control are in green. You might notice there are five men receiving
treatment and only four receiving control. It’s not necessary to have exactly equally sized groups.

 TERM TO KNOW

Randomized Block Design


An experimental design where the subjects are separated into homogeneous groups, called blocks, based on some variable we think may affect the outcome of
the experiment. We then run the experiment separately within each block.

2. Block Design vs. Randomized Design


By doing a block design rather than a completely randomized design, you can observe differences within the group that you might have missed had you done it with a large
group.

 EXAMPLE Suppose the drug was more effective for women than for men. You would see that in this experiment here. You would see that the drug was effective
for women. You would also see that it wasn't effective for men.
One minor disadvantage to running a block design is that you do lose some of the replication that you would have if you had run it in a large group. Sometimes you need to
make your sample size a little bit bigger to overcome that. It might be a little bit harder to draw legitimate conclusions with small groups.

 SUMMARY

In a randomized design, you saw how an experiment might miss an extra level of depth, such as men and women reacting differently to a drug. The subjects or
experimental units are grouped by some similar characteristic that you think might affect the outcome. In this example, we used gender. When evaluating block
design vs. randomized design, you saw that with a randomized block design, experiments run in parallel, resulting in two or more separate experiments. Then, you
can compare the treatments within each of those groups.

Good luck!

Source: This work is adapted from sophia author jonathan osters.

 TERMS TO KNOW

Randomized Block Design


An experimental design where the subjects are separated into homogenous groups, called blocks, based on some variable we think may affect the outcome of the
experiment. We then run the experiment separately within each block.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 36
Completely Randomized Design
by Sophia

 WHAT'S COVERED

This tutorial will discuss a completely randomized design of an experiment through an exploration of:

1. Completely Randomized Design

1. Completely Randomized Design


A completely randomized design means that treatments will be randomly assigned to individual participants in an experiment.

An advantage of this design is that it is very quick and easy to implement. You could take your group of experimental units, assign them a number, and have the odds in
the treatment group and the evens in the control group. Alternatively, you could roll a die for each subject, putting ones and twos in the control group, threes and fours in
the first treatment group, and fives and sixes in the second treatment group.

However, a disadvantage of this design is that treatment and control groups could have disproportionate representations of the population.

IN CONTEXT
Let’s say you developed a new drug to combat the symptoms of acid reflux. You want to see if it’s more effective than what is currently available. So you get 500
volunteers and write “1” on 250 slips of papers and “2” on the other 250 slips of paper. You put all 500 sheets of papers into a hat, mix them up, and the
volunteers retrieve one slip of paper each.

Those who selected “1” will receive the new drug and those who selected “2” receive the drug that's currently available. This is the simplest way to assign
subjects to treatments. However, it's not necessarily ideal for every scenario.

Let’s say that the acid reflux drug is more effective for men than it is for women. It’s not really a problem if you divide the treatment control groups like this:

In this particular case, you can see there is roughly the same amount of females and males in the treatment group and the control group. Since there is a relative
equal assignment on each side, it will be easy to see if the new drug is more effective for males than for females. Problems occur when the random assignment
doesn't match the proportions of the population equally.

Consider for a moment if this happened:

Both groups are roughly the same size. Will you be able to determine if the treatment is more effective for men? Why not?

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 37
If the drug were more effective for men and than women, you actually wouldn't notice because there aren't that many men in the treatment group. The proportions
are way out of whack. This sometimes happens with random assignment.

You can see that in a completely randomized design, subjects are assigned using random processes such as numbers in a random number generator, random number
table, numbers in a hat, or names in a hat. The problem is that it's not always the best way to assign treatments.

 TRY IT

A tire company wants to launch a new type of rubber for its bicycle tires. It has 300 bikes to use for study and a completely randomized design is desired. What would
be the first step to achieving a completely randomized design?

They could place numbers 1-300 in a hat and have each rider pull out one number. Numbers 1-150 receive the old rubber tires and 151-300 receive the new rubber
tires. The cyclists won’t know which type of tire they are receiving.

There is an issue with this design. Can you think what this might be?
What if bike commuters are all in the same group? They might wear their tires out faster regardless of the new or old tires. Can you think of other aspects that may
impact this experiment?

 BIG IDEA

While there are better ways to gather information for an experiment, a completely randomized design is the easiest.
 TERM TO KNOW

Completely Randomized Design


An experimental design where the assignment of subjects to treatments is done entirely at random

 SUMMARY

In a completely randomized design, which is the simplest way of assigning individuals, the subjects are assigned using a random process like numbers in a
random number generator, random number table, numbers in a hat, names in a hat. The problem is it's not always the best way to assign treatments.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Completely Randomized Design


An experimental design where the assignment of subjects to treatments is done entirely at random.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 38
Matched-Pair Design
by Sophia

 WHAT'S COVERED

This tutorial will explain matched-pair design experiments by examining the characteristics and examples of:

1. Matched-Pair Design
a. With Subjects in Pairs
b. With Subjects as Individuals

1. Matched-Pair Design
In a matched-pair design experiment, you form experimental units by pairing subjects that are as similar as possible. One subject goes to the treatment group and the
other subject goes to the control group. Having very similar pairs helps control for the other variables we haven't considered.

 EXAMPLE Choosing a pair of women who are the same age, have the same exercise habits, and live in the same area allows us to look at only the variable we
are studying, while avoiding the effects of age, exercise, and location on the outcomes of the experiment.
In matched-pair design, subjects can be assigned to the treatment and control groups in two different ways:

Subjects who are similar with respect to variables that could affect the outcome of the experiment are paired together, and then one of them is assigned to the
treatment group and one is assigned to the control group
Each subject is assigned to both groups, where each subject acts as their own matched-pair.

 HINT

This type of design is also similar to a case-control study, but here researchers are giving a treatment instead of just observing the participants.
 TERM TO KNOW

Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that could affect the outcome of the experiment are paired together, then one
of them is assigned to one treatment and one is assigned to the control. This can also be done by assigning each subject to both groups, where each subject acts
as their own matched-pair.

1a. With Subjects in Pairs


Matched-pair design involves matching subjects into pairs that are as similar as possible with respect to any variable that may affect the outcome.

Video Transcription
[MUSIC PLAYING] Hello. Let's take a look at a common instance of how matched pair design is used. An experiment is being conducted to test the effectiveness of a
new flu vaccine. Gender and age are the two variables that may play a significant role in how well this vaccine works.

So how can we study only the effects of the vaccine? A matched pair design, that's how. Groups of two who are similar in both gender and age are created. Then one
is given the vaccine. And the other is given a placebo shot. This allows us to study only the effects of the vaccine and not the effects of the other variables.

Here's how it goes. For this study, there are 20 participants-- 10 men and 10 women of varying ages labeled A through T. The first variable being gender, we separate
the 20 participants into two groups-- one group of 10 males, the other 10 females.

With the second variable being age, we will pair males of similar ages. Then we'll do the same with females. So looking at the males, the first similar ages we see are
24 and 25. Our first matched pair will be participants A and H. Using this process, we see participants L and J, D and C, T and K, and P and R will also be good
matched pairs. Then the same method is applied for similarly-aged females.

Once we have our 10 sets of matched pairs, we can randomly assigned the treatment to one half of the pair and the control to the other half. This will allow us to
study how this new flu vaccine works. Oh, that reminds me. Be sure to get your flu shot. I'm getting one after yoga this evening. Kidding. That's ridiculous since I'm a
computer. Totally can't do yoga or flu shots. Thanks for watching and see you next time.

IN CONTEXT

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 39
There are 20 participants for an experiment for a flu vaccine. Gender and age may play a role in how well this treatment works. Groups of two are created; each
group is as similar as possible with respect to any variable that may affect the outcome.

Participant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Gender M F M M F F F M F M M M F F F M F M F M

Age 24 21 42 39 35 37 22 25 31 32 51 31 61 26 38 55 26 56 52 48

There are 10 men and 10 women of all different ages. Participants will be listed by gender. So participant 1, 3, 4, 8, 10, 11, 12, 16, 18, and 20 are the males. The
rest are females.

Participant 1 3 4 8 10 11 12 16 18 20
Males
Age 24 42 39 25 32 51 31 55 56 48

Participant 2 5 6 7 9 13 14 15 17 19
Females
Age 21 35 37 22 31 61 26 38 26 52

Age is suspected to also play a role in effectiveness, so within the male category, two ages that that are closest together--24 and 25--are chosen. Therefore,
participants 1 and 8 will form a matched pair. Participants 10 & 12, 4 & 3, 20 & 11, and 16 & 18 are also matched pairs due to similarly aged males. The same
criteria is applied for similarly aged females.

Participant 1 8 12 10 4 3 20 11 16 18
Males
Age 24 25 31 32 39 42 48 51 55 56

Participant 2 7 14 17 9 5 6 15 19 13
Females
Age 21 22 26 26 31 35 37 38 52 61

Now, to continue the experiment, one of the two in the pair is randomly assigned to receive the flu vaccine and the other one will be assigned to the control group.

1b. With Subjects as Individuals


Also in a matched-pair design, each subject can be assigned to both groups instead of one, then randomly assigned the order in which treatments are applied. Each
participant then counts as his or her own matched pair. This design essentially compares someone to themselves.

IN CONTEXT
Suppose that you have a tire company that's considering rolling out a new type of rubber for its bicycle tires. There are 300 bicycles available. In a completely
randomized design, you would place the numbers 1 - 300 in a hat. Bikers that pull numbers 1 -150 would receive old rubber tires, and the 151- 300 would receive
the new rubber tires. They won’t necessarily know who's getting which tires.

But what if the 300 riders don't all ride the same way or equally as often? What do you do then? How do you create two groups that are roughly the same, with
the exception of the bicycle tires?

One way to do it is with a matched-pair design. You could still put the numbers 1 - 300 in a hat. The only difference is that the people who pull out 1- 150 would
get both the old and the new. They would put the old in the front and the new rubber tire in the back.

Then, the people who pulled out 151 - 300 would get the new rubber tire in the front and the old one in the back.

So there's still some randomization going on. The only difference is that every biker will get one old tire and one new tire. This will allow you to compare the tread
wear for each bike because the front and rear tire get worn somewhat equally. It won't matter how much the biker rides or where.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 40
 SUMMARY

In a matched-pair design, two numbers whose characteristics are very similar are paired, then each one is sent to a different group. When applying matched-pair
design, typically, each subject is assigned to both groups instead of one, as was the case with the bicycle tires situation. Matched-pairs designs are often done by
assigning both treatments to every participant, which is commonly used in the matched-pairs design.

Good luck!

 TERMS TO KNOW

Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that could affect the outcome of the experiment are paired together, then one of
them is assigned to one treatment and one is assigned to the control. This can also be done by assigning each subject to both treatments, where each subject acts as
their own matched-pair.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 41
Surveys
by Sophia

 WHAT'S COVERED

This tutorial will briefly introduce you to surveys, demonstrating the following concepts:

1. Introduction to Surveys
2. Survey Design

1. Introduction to Surveys
A survey is a data gathering technique. It's an information collection tool, and a lot of organizations use these. Surveys allow organizations a way to gather data so that
they can target the specific information that they want.

The following are examples of how surveys can be used:

A store might use a survey to figure out something about its customers.
Politicians might use a survey to gather information about their constituents.
Someone hiring for a position in a company might use a survey to learn more about their labor market, who they can hire, and who is not available in that area, etc.

In all of these examples, the survey is a tool being used to increase the amount of specific information someone has. For each survey, the researcher has selected the
variables of interest, or the variables that he or she is interested in gathering data on.

 TERMS TO KNOW

Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.

Variables of Interest
The variables the survey wishes to measure about those taking the survey.

2. Survey Design
A survey must be carefully designed to elicit the intended information. The survey design is an important element of surveys. If you are designing a survey, you want to get
a representative sample of your population. So as with every sampling technique, designing a survey is all about the process and being able to get accurate data from a
representative sample.

 BIG IDEA

Just like with any sample, it's important to define what you're interested in before you begin surveying.

 BRAINSTORM

You might ask yourself: What are the variables that you want to measure? What information do you want people to provide in your survey? Answering these questions
is going to be important because those answers will help you understand the purpose of the information you generate with your survey.
So, for example, if it's a survey about employment, you're going to want to ask about employment, former employment, current employment, and things like that.

IN CONTEXT
Suppose a teacher uses the following survey at the end of the year for her students:

Course Survey
Strongly Agree Agree Neutral Disagree Strongly Agree

1. The course objectives have been clearly outlined for me. ❍ ❍ ❍ ❍ ❍

2. The methods for evaluating student work have been applied fairly. ❍ ❍ ❍ ❍ ❍

3. This course has challenged me intellectually. ❍ ❍ ❍ ❍ ❍

4. I have worked hard to meet the requirements of this course. ❍ ❍ ❍ ❍ ❍

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 42
5. This course was harder than I thought it was going to be. ❍ ❍ ❍ ❍ ❍

6. I looked forward to attending classes. ❍ ❍ ❍ ❍ ❍

7. I have learned a great deal. ❍ ❍ ❍ ❍ ❍

8. This course covered more material than I thought it was going to. ❍ ❍ ❍ ❍ ❍

9. I know more now than before taking this course. ❍ ❍ ❍ ❍ ❍

This teacher wants to know whether or not she did a good job outlining course objectives. This survey asks about evaluating student work and academic
challenge. You'll notice that she's provided answer choices from strongly agree to strongly disagree.

The teacher thought about all of the different things she wanted to learn from her students including her teaching and listed them all in her survey. The information
she gathers from this survey will help her answer the question of how clearly she outlined her course objectives for her students.

 TERM TO KNOW

Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.

 SUMMARY

To recap, surveys are used to obtain data or information from the population. It's important that you determine what you want to understand and why and for whom
this is being collected, which may impact survey design. We talked about surveys, which are also called sample surveys. We also talked about variables of
interest, which are the things that you wanted to measure because you're interested in knowing them.

Thank you and good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.

Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.

Variables of Interest
The variables the survey wishes to measure about those taking the survey.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 43
Blinding
by Sophia

 WHAT'S COVERED

This tutorial is going to teach you about blinding and will explain the following topics:

1. Blinding
2. Double-Blind and Single-Blind Experiments

1. Blinding
Blinding is one of those principles of experimental design whereby the subjects don't know what treatments they're going to receive.

When you randomize an experiment, it is done to reduce bias. However, it's possible to give subtle clues regarding what treatment they're receiving; it’s important that the
people don’t know what they're receiving.

Why is this? Because it might be an incentive for them to either stay on the treatment if it's a drug or go off the treatment if they think they're not getting the real drug.

Also, it may be true that people with an agenda might want to bend the results in their favor. They might want to make the results of an experiment seem more positive
than they really are. This idea of the experimenter wanting to bend the results in their favor is called the “experimenter effect”.

To counteract both of those two ideas, we implement a strategy called blinding. Only people who are behind the scenes will know who is getting what. No one, either
directly involved in the experiment or taking any of the treatments, knows what treatments they're receiving.

IN CONTEXT
If subjects know which treatment group they are assigned to, it may influence behavior. So the treatment group will receive a pill, and the control group will
receive a pill. The only difference is that one pill has the active treatment in it and will be only given to those in the treatment group.

Ideally, when you open the pills up, they would look the same on the inside, too. The idea is that no one knows which pill is fake and which one has the tested
drug.

The fake drug is usually some kind of a sugar or something that makes the person in the control group feel like they're actually taking something when they’re
really not.

 TERM TO KNOW

Blinding
The practice of making sure that certain individuals do not know which subjects are receiving which treatment.

2. Double-Blind and Single-Blind Experiments


A lot of the times, experiments are what we call double-blind. Double-blind experiments means that the subjects don't know what treatment they're receiving, nor does
anyone who has any contact with them. This can eliminate bias, due to a subject thinking they know what group they're in. It also reduces the experimenter effect of
someone trying to bend the results.

Single-blind experiments, on the other hand, can have subjects blinded, but the researchers are not.

IN CONTEXT

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 44
A double-blind study is ideal, but sometimes it is just not feasible. Suppose there is an exercise study--whether or not exercise is effective for weight loss. People
are going to know if they're exercising or not. It's impossible to assign people to exercise--the treatment in this case--and have them not know they're receiving
the treatment.

However, the experimenters don't need to who was assigned not to exercise. This is single-blind because the experimenters don't know. The experimenters were
blinded, but the subjects were not.

 BRAINSTORM

Can you think of a single-blind experiment that would be set up to have the researchers know group assignments, but the participants do not?
 TERMS TO KNOW

Double-Blind Experiment
An experiment where neither the subjects nor anyone in contact with them has any knowledge of which subjects are receiving which treatment.

Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving which treatment or people in contact with the subjects have no
knowledge of which subjects are receiving which treatment, but not both.

 SUMMARY

Blinding is a powerful tool for preventing different types of biases, such as the experimenter effect. Different studies allow for different levels of blinding. Ideally,
double-blind is best since both participants and the people with direct contact with the participants are not aware of group assignment. As you saw in the exercise
example, sometimes double-blind just is not realistic. Participants will know if they are exercising or not. In that case, single-blind experiments are the next best
thing, which means that either the subjects or the researcher are aware of group assignments; but not both.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Blinding
The practice of making sure that certain individuals do not know which subjects are receiving which treatment.

Double-Blind Experiment
An experiment where neither the subjects, nor anyone in contact with them, has any knowledge of which subjects are receiving which treatment.

Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving which treatment, or people in contact with the subjects have no knowledge
of which subjects are receiving which treatment, but not both.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 45
Placebo
by Sophia

 WHAT'S COVERED

This tutorial will discuss the Placebo Effect by focusing on:

1. Placebo

1. Placebo
In basic terms, placebo is a fake treatment. That doesn’t mean that people don’t respond to it; instead, they think or expect that the treatment will result in a change. A
placebo doesn't do anything. It has no active treatment, yet people feel better anyway as if they have willed themselves to feel better. This is called the Placebo Effect.

While the treatment group gets the actual drug, the control group receives a placebo as their treatment. They get the fake drug with no active ingredient in it--usually some
kind of a sugar or something. It doesn't do anything and has no active ingredient.

Sometimes, the treatment containing the actual drug doesn't work any better than the placebo. This can happen. It’s evidence against the treatment working.

IN CONTEXT
Suppose that you developed a treatment that relieved pain and you conducted a study on pain. You had a control group receiving a sugar pill and a treatment
group receiving the actual drug that you created. Here are your results.

Would you say that your treatment is effective? Why or why not?

The answer is here is that your treatment is not very effective. The numbers, 42 and 36, are not far apart. These results would be weak evidence for the
effectiveness of the drug.

What if the results looked like this?

Notice that you still have 36% of patients in the placebo group reporting relief of pain. However, the difference between 36% and 80% is significant. This would be
considered the evidence for the effectiveness of the drug.

 TERMS TO KNOW

Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.

Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even when taking a placebo, which contains no active ingredient.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 46
 SUMMARY

Placebos are a form of control. They're a fake drug. People can respond to the fake drug, thinking they are receiving treatment, which is called the Placebo Effect.
Experimenters will assess the effectiveness of the treatment against the effectiveness of the placebo. If the gap between the two is significant, it is considered
evidence that treatment has a considerable effect.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.

Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even when taking a placebo, which contains no active ingredient.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 47
Variables
by Sophia

 WHAT'S COVERED

This tutorial will discuss variables within the field of statistics, and introduce the concept of confounding variables. The following elements will be the main focus of
this tutorial:,

1. Variables
a. Variables of Interest
b. Explanatory and Response Variables
2. Confounding Variables

1. Variables
In statistics, a variable is any attribute that we can measure about a population, used in a study. It is very important to carefully define the variables to be measured when
creating a study.

Think of things that we could find out about people:

Age
Weight
Gender
Ethnicity
Favorite Food
Number of Pets
Smoker or Non-Smoker
ZIP Code
Number of Siblings
Political Affiliation
Favorite Sport

All sorts of these things are variables. You might only want to know one of these things or some of these things.

 TERM TO KNOW

Variable
Any attribute or number that can be measured about individuals in a study.

1a. Variables of Interest


For a political poll, for example, you wouldn't necessarily need to know if a candidate was a smoker or the number pets they have. However, you might want to know about
their age, gender, state, political affiliations, zip code, ethnicity, and city.

Since those variables could potentially have some bearing on a political poll. They are the variables of interest for this study--literally, the variables you would be interested
in measuring.

However, if you were conducting a weight loss study, the political affiliation will likely not be a variable to measure, but favorite food might seem important.

 TERM TO KNOW

Variable of Interest
Any variable which we need to know about in the context of a study.

1b. Explanatory and Response Variables


Some studies try to determine a cause-and-effect relationship between two variables in that one variable causes the other. An increase in one corresponds to an increase
or decrease in the other.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 48
In those cases, we define the one that causes the other as the explanatory variable. In a study, you can have more than explanatory variable.

Then, variables that are the result are called response variables.

Examples of Explanatory and Response Variables

Explanatory: Number of hours you study You might hypothesize that as you increase the number of hours that you study, your grade on the exam will increase as
Response: Grade on the exam well. So the number of hours you study, therefore, helps to explain your grade.

Explanatory: Average monthly temperature


You might assume that as the temperatures get warmer, that ice cream sales would go up in kind.
Response: Ice cream sales

Something that's a little bit less obvious is whether or not gender, which is a categorical variable, plays a role in which political party people will choose. Are males more
likely to be Republican? Or are women more likely to be independent voters? We don't know. But that would be an interesting question to investigate.

 TERMS TO KNOW

Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will correspond to an increase or decrease in some other variable.

Response Variable
A variable that is affected by the explanatory variable.

2. Confounding Variables
The word confounding refers to when two variables get mixed up with one another and you can't tell the effect of one variable from the effect of the other variable. The
confounding variable is the one not accounted for in a study. It is an unseen variable that has a significant effect on the response variable and is also related to the
explanatory variable.

IN CONTEXT
Suppose that a researcher wants to know whether a high protein diet will help lab rats gain more weight than a low protein diet. The researcher has 26 lab rats
and she selects 13 of the smallest rats to receive the low protein diet and 13 of the largest to receive the high protein diet. At the end of the study, she weighs the
rats to determine their weight gain and finds that the rats on the high protein diet gained more weight.

Can you think of anything that she did wrong in this study?

The answer involves the occurrence of confounding. Remember, confounding is when two variables get mixed up and you can't tell the effect of one variable from
the effect of the other variable.

So in this case, the effect of the diets--whether or not the high protein diet caused the rats to gain more weight--was confounded by the fact that the heaviest rats
were put on the high protein diet. It’s not clear if the high protein diets were effective at weight gain. Something else may have caused the weight gain since they
were heavy already.

Therefore, these are the two variables of interest in the study. The high protein diet was supposed to be the explanatory variable. The weight gain was supposed
to be the response variable. The researcher was going to try to figure out a link between the two.

However, because of the way she assigned the rats, only a limited conclusion could be drawn. She wasn't able to draw the direct conclusion that she was hoping
for--and that is confounding. Confounding should be limited in experiments when possible.

 TRY IT

A high school math teacher, hoping to have his students do well on the final, offers an optional review session. He states, “No one who's ever attended the review
session has ever scored less than a B”.

What is the teacher trying to imply? Why isn’t his implication correct?

You may have come up with that he's trying to imply that the review sessions will cause the students to do better. That may be true; however, there may be a few

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 49
confounding variables. Maybe only his best and brightest students attend the optional review and these are students that may have done well on the final exam
anyway. The effects, if any, are confounded by the intrinsic motivation of students to show up to the session.
 TERMS TO KNOW

Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential effects of some other variable which was unaccounted for.

Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study can draw.

 SUMMARY

Variables are what we choose to measure in a study. The variables of interest will depend on the questions that you're trying to answer. Not every variable must be
measured--just the ones that are of interest. By looking at variables in context, you learned that if a cause and effect relationship is thought to exist, you can break
the variables down even further into explanatory and response variables. Confounding occurs when there is a variable that is chosen as an explanatory variable in
an experiment, but because another variable got in the way, it cannot be determined to explain a cause. You explored confounding variables in action to
demonstrate how they can limit the conclusions that can be drawn from the supposed explanatory variable. In effect, the confounding variable inhibits a cause-
and-effect conclusion. Often, it's one that you didn't think to measure, which is problematic.

Good luck!

Source: ADAPTED FROM SOPHIA TUTORIAL BY JONATHAN OSTERS.

 TERMS TO KNOW

Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential effects of some other variable which was unaccounted for.

Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study can draw.

Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will correspond to an increase or decrease in some other variable.

Response Variable
A variable that is affected by the explanatory variable.

Variable
Any attribute or number that can be measured about individuals in a study.

Variable of Interest
Any variable which we need to know about in the context of a study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 50
Question Types
by Sophia

 WHAT'S COVERED

This tutorial will cover the topic of question types. We will cover binomial questions, as well as discuss the difference between open-ended and closed questions,
through the exploration of:

1. Binomial Questions
2. Closed Questions
3. Open-ended Questions

1. Binomial Questions
Recall that there are two types of data:

Qualitative Data Quantitative Data

Deals with numbers and can be


Deals with categories or descriptions
measured or perform arithmetic with

Also called Categorical Data


Also called Numerical Data

A binomial question is a type of question with only two answer choices. In order to understand what a binomial question is, it helps to break down the word itself. Bi means
“two” and nomial means “names”. So a binomial question is a question with two names.

Do you think that this is a qualitative type of question or a quantitative type of question?

A binomial question collects qualitative data because there are two possible responses. It's a question with two categories.

 EXAMPLE The simplest version of a binomial question is yes or no. You might remember this type of question from elementary or middle school:

Do you like me?


(Check Yes or No)

Yes No

Other examples of binomial questions include:

Do you prefer dogs or cats?


Are you a smoker or non-smoker?

In that last question, some people feel like they fall somewhere in between the two options. They may currently be a smoker, but they are trying to quit. Sometimes
questions have some shades of gray. What about this one?

"Have you ever smoked?"


This is a binomial question that would address people who don't currently smoke but used to.

Sometimes things don't neatly fit into two boxes. Nor do they work when the questions have more than two answers or are open-ended questions such as, “How do you
feel about the construction of the new baseball diamond located on the north end of town?". It doesn't really work to place something like that into two categories.

 TERM TO KNOW

Binomial Question
A question with only two answer choices.

2. Closed Questions

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 51
Many surveys have a combination of open and closed questions. Closed questions have short, definite, usually multiple choice type answers.

Your Overall Experience


Excellent Good Satisfactory Fair Poor

The Teacher ❍ ❍ ❍ ❍ ❍

Class Content ❍ ❍ ❍ ❍ ❍

The Class as a Whole ❍ ❍ ❍ ❍ ❍

What did you like about the class?

In the above example, you'll notice that the highlighted pink area shows multiple choices --poor, fair, satisfactory, good and excellent-- and those are your only choices.

 HINT

When there are only certain answers to select, such as yes/no or multiple choice, that is the signal that you are dealing with a closed question.
 TERM TO KNOW

Closed Question
A question type with only so many different answer choices.

3. Open-ended Questions
Open questions, also called open-ended questions, are subjective. These are areas where someone can click into the field and start to type their comments and/or
opinions. These comments are open to the interpretation of the person being surveyed.

The comments are also open to the interpretation of the person conducting the survey when they do the analysis. Usually, they need to be analyzed by a person in order to
really get the full effect from it. Oftentimes, in the desire for simplicity, someone will give a question in closed form that really should be an open-ended question.

An example of an open-ended question is highlighted in blue below.

Your Overall Experience


Excellent Good Satisfactory Fair Poor

The Teacher ❍ ❍ ❍ ❍ ❍

Class Content ❍ ❍ ❍ ❍ ❍

The Class as a Whole ❍ ❍ ❍ ❍ ❍

What did you like about the class?

 TERM TO KNOW

Open Question
A question type with no answer choices; the respondent can choose what he or she wants to say to answer the question.

 THINK ABOUT IT

Suppose you are in a court of law and the lawyer asks, “Were you at the crime scene?”

“Yes, but I didn’t see anything other than people running and police arriving. It was chaos.”

“Just yes or no, please.”

The lawyer asked a closed question and wants only a yes/no answer. By attempting to explain your circumstance, you were trying to answer it in an open-ended
question type. The lawyer reverts back to the closed question again by asking you to select either “yes” or “no.”

 SUMMARY

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 52
Binomial questions produce categorical data. These are questions with two possible responses, or two categories. It's important to consider whether or not there
really are just two categories before you ask something as a binomial question. Open questions allow for more explanation and they're sometimes difficult to
interpret because they're not very cut and dried like closed questions. Sometime open-ended questions are called "essay" questions. Closed questions are easier
to interpret, but they're not always appropriate for the situation. Closed questions are sometimes called multiple choice type questions.

Good luck!

Source: ADAPTED FROM SOPHIA TUTORIAL BY JONATHAN OSTERS.

 TERMS TO KNOW

Binomial Question Type


A question that will yield categorical data with just two possible values.

Closed Question
A question type with only so many different answer choices.

Open Question
A question type with no answer choices; the respondent can choose what he or she wants to say to answer the question.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 53
Accuracy and Precision in Measurements
by Sophia

 WHAT'S COVERED

This tutorial will discuss accuracy in measurement versus precision through the following exploration:

1. Contrasting Accuracy and Precision


a. Scale Example
b. Dartboard Example

1. Contrasting Accuracy and Precision


When talking about accuracy, the focus is on how close the measurement is to what the measurement should have been.

Precision, on the other hand, is concerned with how consistent the measurements are to each other. In other words, how close are the measurements to a single value,
regardless of whether or not that single value is the right answer.

 TERMS TO KNOW

Accuracy
The extent to which the values, when considered all together, center around the correct value for a variable.

Precision
The extent to which the values are very close to each other, even if they are not near the correct value.

1a. Scale Example


Suppose you work for a consumer report company that sells personal weight scales. It’s your job to decide whether each of these scales, labeled #1, #2, #3, and #4, are
accurate, precise, both, or neither.
You take someone who weighs 161.8 pounds and placed them on the four different scales, five times each.

Take a look at Scale #1 and determine if this scale is accurate, precise, both, or neither.

Scale 1
Accuracy ✔ Precision ✘

160.4 158.8 161.4 164.2 162.0

Scale #1 is accurate because the numbers average out to the right answer of 161.8. Although it reported a fairly low number such as 158.8 and a high number of 164.2, by
and large, the numbers average out to what's pretty close to the right answer.

However, Scale #1 is not precise because the numbers are not close to a single value every time.

Take a look at Scale #2 and determine if this scale is accurate, precise, both, or neither.

Scale 2
Accuracy ✘ Precision ✔

168.2 167.8 167.8 168.0 168.4

You can tell just by looking at the numbers that all values are within 1 pound of each other, which means it is precise. Remember, it doesn’t need to be close to the actual
correct number, but they need to be close to each other.

But take a look at the average. The average of Scale #2 is about 168, which is overestimating by at least 7 pounds, so this scale is not accurate.

Take a look at Scale #3 and determine if this scale is accurate, precise, both, or neither.

Scale 3
Accuracy ✔ Precision ✔

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 54
161.0 161.8 161.6 162.0 161.2

All of these are within a pound of each other. They're also very close to 161.8 pounds, the true weight of the individual you selected. Having the numbers all close to each
other make it precise, and the numbers average out to be very close to the correct weight of 161.8. Therefore, Scale #3 is both accurate and precise.

Take a look at Scale #4 and determine if this scale is accurate, precise, both, or neither.

Scale 4
Accuracy ✘ Precision ✘

161.8 170.2 165.4 168.4 164.8

It actually did get the correct weight of 161.8 once, but if you look at the five measurements taken as a whole, they're pretty far off and they tend to overestimate. They
don't really center around the right number all that much, so it’s not accurate. The numbers are also all over the place, so this scale is not precise.

 BRAINSTORM

If you worked for a consumer report company and you were evaluating the above scales, which scale would you choose and why?

1b. Dartboard Example


A dartboard is a very popular example of precision and accuracy, assuming the bulls-eye is the desired outcome, or “value”.
Precise Not Precise

Accurate

Not Accurate

For the cases above:

Precise and Accurate: In the top left corner, the darts are clumped together AND around the bulls-eye.
Not Precise, but Accurate: In the top right corner, the darts are not clumped together, but they loosely surround the bulls-eye.
Precise, but Not Accurate: In the bottom left corner, the darts are clumped together, but not around the correct “value”, or in this case, the bulls-eye.
Not Precise nor Accurate: In the bottom right corner, the darts are spread out and are not surrounding the bullseye.

 SUMMARY

By contrasting accuracy and precision, you now know that accuracy is how close the measurements are to the right answer, though they may not necessarily land
exactly on the correct answer. Precision is how consistent measurements are with each other, even if they are not near the correct value. Generally, you will see
them clumped together. In a given measurement scenario, high accuracy and high precision is ideal.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Accuracy
The extent to which the values, when considered all together, center around the correct value for a variable.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 55
Precision
The extent to which the values are very close to each other, even if they are not near the correct value.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 56
Absolute Change and Relative Change
by Sophia

 WHAT'S COVERED

In this tutorial, you're going to learn about the difference between absolute change, which is an increase or decrease represented as a raw number, and relative
change, which relates that change differential back to the original value. Specifically, this lesson will cover:

1. Absolute Change and Relative Change


2. Calculating Absolute Change
3. Calculating Relative Change
4. Examples of Absolute Change and Relative Change

1. Absolute Change and Relative Change


Absolute change is the actual change in units. It could be the actual change in pounds, degrees, inches, percentage points, or lots of different things.

 EXAMPLE Suppose a political candidate's approval rating went up from 44% to 48%. That absolute change is four percentage points.
Relative change is the percent difference from the previous value, and it's always expressed as a percent.

 HINT

Relative change can also be referred to as the percent error.

IN CONTEXT
An infant weighed 6.5 pounds at birth, and one year later, weighed 14.5 pounds. Decide if each of the following statements are true.

Statement 1: The infant's weight change was an increase of eight pounds.

Well, that's a true statement. 14.5 minus 6.5 is 8 pounds. It increased by 8 pounds.

Statement 2: The infant's weight change was an increase of 123%.

This one's a little bit less obvious, but it's also true. The eight-pound increase was more than double what the birth weight was. It was an increase of over 100%.
In fact, when you do the calculation, 8 divided by 6.5 is 123%.

 TERMS TO KNOW

Absolute Change
The raw increase or decrease in the value of a variable

Relative Change
The percent increase or decrease in the value of a variable.

2. Calculating Absolute Change


How do you calculate absolute change? Another word for it is the absolute difference. You simply calculate the difference between the new and the old.

 FORMULA

Absolute Change

In the example above, 14.5 minus 6.5 was a difference of 8 pounds.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 57
It is also a positive 8 pounds because it went up, versus going down.

3. Calculating Relative Change


The relative change, or the relative difference, is calculated by taking the absolute difference and dividing it by its originating value.

 FORMULA

Relative Change

In the example above, the absolute difference was 8 pounds and the original value was 6.5. When you put this into a calculator, you get 1.23.

When expressed as a percent, 1.23 is 123%. That means that there was a 123% increase over the birth weight. That was the relative change.

4. Examples of Absolute Change and Relative Change


Consider the following example that shows this year and last year's enrollment at Memorial High School.

Video Transcription
Hey again. Let's walk through an example of how absolute change and relative change are found and the differences between them. The data we'll use is the
enrollment for this year and last year classes at Memorial High School. First, we'll seek to determine which class has the highest absolute change. Then the highest
relative change.

Will it be the burnouts, the nerds, the geeks, or the dweebs? It's more than anyone's guess. It's statistics. Anyway, let's find out. We'll start with absolute change. To
calculate this, simply subtract last year's value from this year. As you can see, three of the four classes had increases in enrollment.

So of the classes that had a positive absolute change, the burnouts had the highest with 310 students. Now onto relative change, which is calculated by dividing the
absolute change by the original number. With that in mind, and looking at the classes again, repeat this formula with all four groups. This is what you'll see.

The relative change for the burnouts is a sizable increase of 24%, while a more modest 10% appears for the nerds. The Geeks, on the other hand, experienced a
decrease of 6%. Which finally brings us to the dweebs. While they're the smallest overall class, they have the highest relative change with an increase of 26%. The
dweebs enrollment wasn't big to begin with, so even a normal absolute change resulted in the largest relative change.

To summarize, here's a breakdown of the distinction between the two categories. Absolute change is the difference in raw numbers. In this case, it's the actual
change in enrollment from one year to the next. Whereas, the relative change converts how this year compared to last year in terms of a percent of the original
number.

Looking at the absolute change and relative change can tell different stories, and often times you humans find these stories are a valuable way to analyze data. There
you have it. A quick illustration of absolute change and relative change. Keep plugging away and I'll see you in the next video.

IN CONTEXT
Let's look at another example. The following table shows the results of the 1990 census and the 2000 census, along with the absolute change and relative
change.

1990 2000 Absolute Relative


State
Population Population Change Change

Florida 12,937,926 15,982,378 3,044,452 24%

Georgia 6,478,216 8,186,453 1,708,237 26%

Hawaii 1,108,229 1,211,537 103,308 9%

Idaho 1,006,749 1,293,953 287,204 29%

Illinois 11,430,602 12,419,293 988,691 9%

Indiana 5,544,159 6,080,485 536,326 10%

Iowa 2,776,755 2,926,324 149,569 5%

Kansas 2,477,574 2,688,418 210,844 9%

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 58
Absolute Change: To calculate the absolute value, simply subtract the 1990 value from the 2000 value. For example, Florida's absolute value can be found by
subtracting 12,937,926 from 15,982,378 to get an absolute change of 3,044,452.

All of the states in the list had increases in the population. Some were not very much, like Hawaii, which only had about a 100,000-person increase. Some were a
lot, like Georgia and Florida, which increased by over a million people. The highest absolute change was 3,044,452 people, in Florida.

Relative Change: The question of which state had the largest relative change between that time is a little bit different. Looking at Florida again, you need to figure
out if the absolute change of around 3 million was a large change percentage-wise from the old population of about 13 million. It was a large increase but was it
the largest percent increase in the list?

To find the relative change, take each absolute change and divide by the old population from 1990.

Florida's relative change was positive 24%--approximately 3 million divided by 13 million gives you about 24%. Georgia's increase was about 26%, a little bit
larger of a percent increase than Florida. The highest of the list was a 29% increase in the state of Idaho. Notice it didn't have a very large absolute change. But
its population wasn't very big to begin with, and so even a small absolute change can be a large relative change.

 SUMMARY

Absolute change is the absolute difference in raw numbers. It's the change in units. Relative change examines how the new number compares to the previous
number in terms of a percent. Did it go up by 10%? Did it go down by 7%? What happened percentage-wise from then to now?

Good luck!

Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS.

 TERMS TO KNOW

Absolute Change
The raw increase or decrease in the value of a variable

Relative Change
The percent increase or decrease in the value of a variable.

 FORMULAS TO KNOW

Absolute Change

Relative Change

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 59
Using Percentages in Statistics
by Sophia

 WHAT'S COVERED

This tutorial will discuss how to use percentages wisely in statistics by focusing on:

1. Percentage Point vs. Percent


2. Examples
a. Retaking a Test
b. A Politician's Approval Rating

1. Percentage Point vs. Percent


People tend to use percentages without really thinking about what type of percentages they're talking about. Results and statistics are often expressed as percents but it's
important to distinguish between percentage points and percents.

Percents are used to describe the relative change. Percentage points are used to measure absolute change.

 TERMS TO KNOW

Percentage Points
An absolute increase or decrease in a percent value.

Percent Change
A relative increase or decrease in a percent value.

2. Examples
2a. Retaking a Test
Suppose a teacher gives a particularly difficult exam and these six students all failed it. The teacher graciously offered a retake to the students and they all passed.

The table below shows their original score and their retake score. On the retake, Jonathan scored an 88, Ryan scored a 78, Katherine scored an 84, etc.

Original
Student Retake Score
Score

Johnathan 52% 88%

Ryan 38% 78%

Katherine 61% 84%

Isaiah 44% 89%

Teri 50% 82%

Kelly 48% 95%

These changes can be expressed as either percentage points or percent increase. First, which student had the highest increase in percentage points?

Change in
Original
Student Retake Score Percentage
Score
Points

Johnathan 52% 88% 36%

Ryan 38% 78% 40%

Katherine 61% 84% 23%

Isaiah 44% 89% 45%

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 60
Teri 50% 82% 32%

Kelly 48% 95% 47%

Jonathan went from 52% to 88%, that's an increase of 36 percentage points. Ryan went from 38% to 78%, that's an increase of 40 percentage points. We can calculate
that for all of them and see that it was Kelly who increased 47 percentage points.
Now, who had the highest percent increase? Now you need to look at the raw increased numbers and determine who had the highest percent increase over their old
score.

Begin with Jonathan's scores. We need to determine how much of an increase 36 percentage points was over that original score of 52.

Change in
Original Percent
Student Retake Score Percentage
Score Increase
Points

Johnathan 52% 88% 36% 69%

Ryan 38% 78% 40% 105%

Katherine 61% 84% 23% 38%

Isaiah 44% 89% 45% 102%

Teri 50% 82% 32% 64%

Kelly 48% 95% 47% 98%

Jonathan's score increased by 69%. Katherine's only increased by 38% because she had a fairly high score to begin with.

But it was Ryan who had the highest percent increase. He started with a 38 and finished with a 78, a 40 percentage point increase. A 40 percentage point increase over a
score of 38, is over 100%, meaning he more than doubled his old score.

2b. A Politician's Approval Rating

Video Transcription
Let's take a moment to look at one more example of using percentages in statistics. Suppose young Patrick hare has found his way to class president at Memorial
High School, but his approval rating has just hit the skids, dropping from 56% to 42%. Perhaps this is thanks in part to his proposal to phase out all computer
generated voices with English accents. I'm just saying.

Whatever the case, let's determine the absolute change in his approval rating. Take a moment to calculate it out. All right, here's what you should have done. Take 42
and subtract 56 from it. This gives you negative 14. So Patrick's approval rating dropped 14 percentage points. It's a drop, but looking at it that way, Patrick isn't too
concerned.

However, how does that drop look when you calculate it in terms of relative change? Again, take a moment to calculate it out. OK, here's where you start. Take the 14
percentage point drop and divide it by the original approval rating, 56. That will give you minus 0.25, or a 25% drop. Viewed in this context, Patrick sees the drop is a
significant one, which he might not have expected.

Do you see what happens, Patrick? Do you see what happens when you try to phase out a crisp and pleasant sounding computer generated English accent?

Suppose Patrick has found his way to class president at Memorial High School. But his approval rating has just hit the skids, dropping from 56% to 42%.

First, let’s determine the absolute change in his approval rating. Take 42 and subtract 56 from it.

This gives you negative 14. So Patrick's approval rating dropped 14 percentage points. It’s a drop, but looking at it that way, Patrick isn’t too concerned.

However, how does that drop look when you calculate it in terms of relative change? Take the 14 percentage point drop and divide it by the original approval rating, 56.

That will give you -0.25, or a 25% drop. Viewed in this context Patrick sees the drop is a significant one, which he might not have expected.

 SUMMARY

When percentages are used in statistics it's important to know whether the focus is absolute change or relative change. Absolute change is the difference in
percentage points and relative change is a percent increase or percent decrease.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 61
Source: this work is adapted from sophia author jonathan osters.

 TERMS TO KNOW

Percent Change
A relative increase or decrease in a percent value

Percentage Points
An absolute increase or decrease in a percent value.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 62
Index Number and Reference Value
by Sophia

 WHAT'S COVERED

This tutorial is going to teach you about index numbers and reference values, through the definition and discussion of:

1. Index Numbers and Reference Values


2. Consumer Price Index and Inflation

1. Index Numbers and Reference Values


An index number is a way to measure a percent increase or decrease from one point to another. This is typically done with price changes. We set an arbitrary starting point
in time and assign that price an index number of 100. This starting price is called the reference value because we refer back to it every time the price changes.

To calculate the index value for other points in time, you would take the current price, divide by the reference value, and then convert that value to a percent.

 FORMULA

Index Number

How do we work with index numbers and reference values most of the time? Consider the following example:
In 1983 a gallon of milk cost $2.24, so you assign this reference value of $2.24 an index value of 100. Essentially this means that it cost 100% of what it cost in 1983--a
fairly obvious statement.

Year 1983 1988 1993 1998 2003

Price($) $2.24 $2.30 $2.86 $3.16 $3.19

Index Value 100

To calculate the index value for other points in time, like in 1988 when a gallon of milk costs $2.30 or 1993 when it cost $2.86, you would take the current price, divide by
the reference value of $2.24, and then convert that value to a percent.

The index value in 1988, then, is $2.30 divided by the reference value of $2.24. That gives you 1.027, which as a percent is 102.7%. Note that index values are expressed
without the percent symbol, so the index value in 1988 was 102.7. You can complete the table with the remaining values.

Year 1983 1988 1993 1998 2003

Price($) $2.24 $2.30 $2.86 $3.16 $3.19

Index Value 100 102.7 127.7 141.1 142.4

What this indicates is that by the time you get to 2003, a gallon of milk cost 142.4% as much as it did in 1983, or a 42% increase over 1983.

 TERMS TO KNOW

Index Number
A way to measure the relative change in a value, usually the price of a good or service, over time. If the index number is over 100, that means the price has
increased. If the price has decreased, then the index number will be less than 100.

Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.

2. Consumer Price Index and Inflation

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 63
The most prominent index number that you see in everyday life is called the Consumer Price Index. The Consumer Price Index (abbreviated as CPI) measures a percent
increase or decrease in the price of goods and services. Its reference value is 1983, which is why that was the reference value used in the previous example. The U.S.
Bureau of Labor Statistics updates the CPI every month.

The CPI is a general measure of inflation. Inflation means that the index is going up. It's a decline in purchasing power, which means that it costs more now to buy these
goods and services than it did then. That means that the dollar is inflated. Put another way, inflation means that with the same amount of money coming in and with the
same income, you have less purchasing power. It may cost you much more now to do what it cost $100 to do in 1983.

Here's a graph of the CPI over time. Notice the index value is 100 in 1983, between 1980 and 1990. Goods and services costing $100 in 1983 will cost you around $200 if
you look at around 2007. Therefore, the index value was 200 in 2007.

 TERMS TO KNOW

Consumer Price Index


An index published by the US Bureau of Labor Statistics that shows the change in the price of many different goods or services in the United States. It provides a
measure of purchasing power.

Inflation
A relative increase in the price of a good or service over time. A person will need to pay more to receive the same good or service than they did at a previous point
in time.

 SUMMARY

Index numbers allow us to check changes, typically in prices, from one point in time to another. We begin with a reference value, which is the price at some
arbitrary point in time. The index numbers are the percent increase or decrease from that reference value. If the price goes up, the index number will be over 100.
If the price goes down, the index number will be under 100. The most commonly referred index would be the Consumer Price Index or CPI. The CPI shows
percent increase or decrease in the prices of many goods and services, which helps determine the amount of inflation.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Consumer Price Index


An index published by the US Bureau of Labor Statistics that shows the change in the price of many different goods or services in the United States. It provides a
measure of purchasing power.

Index Number
A way to measure the relative change in a value, usually the price of a good or service, over time. If the index number is over 100, that means the price has increased. If
the price has decreased, then the index number will be less than 100.

Inflation
A relative increase in the price of a good or service over time. A person will need to pay more to receive the same good or service than they did at a previous point in
time.

Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 64
 FORMULAS TO KNOW

Index Number

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 65
Bias
by Sophia

 WHAT'S COVERED

This tutorial will cover the topic of bias, specifically focusing on:

1. Bias
2. Hawthorne Effect

1. Bias
Most often, research is done accurately and with integrity. People want to get the job done right. They want to get the answer correct. But sometimes there's something
that happens systematically in the experiment or the study that limits the accurate representation of the population that researching.

Bias, in the statistics world, is systematically misrepresenting the population. It refers to the favoring of certain outcomes in a sample that limits our ability to draw
conclusions about the population. The key word is systematical--it's not necessarily intentional. It could be intentional, but it doesn't have to be.

A way of selecting the sample for your study such that the sample doesn't accurately reflect the population is called selection bias. It's not good, but sometimes it can't be
avoided. On the other hand, sometimes it can be avoided, but isn't.

Publication bias occurs when researchers only want to publish the most sensational findings, or rather, only the positive ones. Only the results that people will want to read
make it to people's eyeballs, while findings deemed boring do not.

 TERMS TO KNOW

Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased data can often favor a specific group of those studied.

Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically excluded.

Publication Bias
The desire of researchers (and research publications) to only print the most sensational or interesting articles.

2. Hawthorne Effect
Often, people will behave differently if they know that they're under observation. They become a bit self-conscious when they are observed and want to do it “right”, so they
act differently.

This idea that people might change what they would typically do based on the fact they're under observation is a type of bias called the Hawthorne Effect.

IN CONTEXT
Suppose you are in charge of a weight loss study. One group is told to take a pill every day. The other group is also told to take a pill every day, but it doesn't have
any active ingredient in it.

You instruct them not to change their behavior. You don’t want them changing the results by eating differently or exercising more. However, these people might
change their behavior based on the fact that they know they're going to be weighed later.

Another thing to consider is when a study is based on participants volunteering their time to be a part of this study. What may happen is that only people with a passion
specific to the study may sign up, which is known as participation bias.

Furthermore, another issue may be that the participants tell you what they think you want to hear, which is response bias.

 TERMS TO KNOW

Hawthorne Effect
People have the tendency to change their behavior when they know they are being monitored.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 66
Participation (Voluntary Response) Bias
Bias that occurs when a sample consists entirely of volunteers. People with strong opinions may be the only ones who volunteer.

Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due to the sensitive nature of the question.

 SUMMARY

Bias has a problematic influence on many experiments and samples. Unfortunately, when bias exists, the results received cannot be generalized to the population,
because they are not reliable. It’s important to know that bias is not always intentional. It can be a systematic flaw in the sample or the experiment, but it's not
always on purpose. Selection bias happens when the sample is not truly representative of the population to which you want to generalize the information.
Publication bias is when researchers publish only the information that they think people want people to see. The Hawthorne Effect is a type of bias that happens
when people act differently, just knowing they are being observed.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased data can often favor a specific group of those studied.

Hawthorne Effect
People have the tendency to change their behavior when they know they are being monitored.

Participation (Voluntary Response) Bias


Bias that occurs when a sample consists entirely of volunteers. People with strong opinions may be the only ones who volunteer.

Publication Bias
The desire of researchers (and research publications) to only print the most sensational or interesting articles.

Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due to the sensitive nature of the question.

Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically excluded.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 67
Nonresponse and Response Bias
by Sophia

 WHAT'S COVERED

This tutorial will cover the topics of nonresponse bias and response bias by focusing on:

1. Nonresponse Bias
2. Participation Bias
3. Response Bias

1. Nonresponse Bias
A nice way to think of sampling is to use a "pot of soup" analogy. You want a representative sample, right? Well, you don't need to drink the entire pot of soup in order to
figure out what's in it. You just need the right taste.

It would be like selecting all of the ingredients from the soup in a single tasting, but certain things can go wrong with the taste test that can affect what you think is in the
soup. Just like you don't really know what the population looks like, you really don’t have a clear idea of all the ingredients in the soup. All you get is the taste, and if you
don't get the right taste, you're going to leave something out and not know exactly what's in the soup (or, population).

In terms of sampling, nonresponse means that someone selected for the sample either can't be contacted or is unwilling to participate.

Now, nonresponse happens. It's an inevitability that you will get uncooperative people, people that don't want to take your survey or people who refuse to be part of your
experiment. It may be that you just won't be able to contact certain people.

The problem of nonresponse is not a problem until the people that weren't able to be contacted or refused to participate differ substantially from the people that were in the
sample. Now the sample is not representative of the population. That is called nonresponse bias because you're not getting an accurate cross-section of opinions. The
opinions of people that you wanted to get are left out.

IN CONTEXT
A workplace wishes to survey 200 of its 1,000 employees about their workload and their stress level, so they put 200 surveys in the workers' mailboxes. It’s likely
that the people who have the biggest workloads might get left out of the sample because they don't check their mailboxes as often as other people. Or if they do
get around to checking their mailbox, they may not complete the survey, or don't return it, because they're so busy.

What effect might that have? The 200 respondents that completed the survey may have reported that workload level is not that high. The only problem is that the
people with the lower workloads are the only people who turned them in, because they had the time to take it. Also, the people with the higher workloads didn't
have the time to take it, reinforcing the conclusion that the company might think the workload level is lower than it really is.

The nonresponse rate is easy to calculate. You just subtract the number that you got back from the number that you mailed out, and that's your nonresponse rate.

 EXAMPLE Say you mailed out 100, and you only got 80 back. Well, that's 20 out of 100, or 20% nonresponse rate.

 THINK ABOUT IT

Consider the different ways of conducting a survey, a poll, or a sample. Which of the following methods do you think has the highest nonresponse rate?
Mail
Telephone
Face-to-Face

The answer is the mail. People will either throw it away, forget to fill it out, or maybe they'll fill it out and then forget to mail it back. This is problematic because when
the United States takes its census of everyone in the country, it does so by mail. Sometimes they have to do follow-ups.
In samples with high rates of nonresponse, follow-ups typically are needed. Suppose you started with a mailing. You might need to follow up by calling them at home. If
you can't reach them by calling them at home, you might need to follow up by coming directly to their house.

Sometimes, even when they are contacted, someone will refuse to participate. Follow-ups like this might be more necessary in some areas of the country than others
because different areas of the country have different rates of nonresponse.

 TERMS TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 68
Nonresponse
Nonresponse is a lack of response from people you've selected. It affects the ability to draw conclusions from your sample.

Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate in a sample have substantially different opinions than the people who
were included in the sample, resulting in a misrepresentation of the population.

2. Participation Bias
On the other end of the spectrum is when people are excessively passionate about a topic and they’re eager to participate. The people who raise their hand to participate
are volunteering their time because they have a strong opinion about the topic at hand. Participation bias happens when people participate because they have strong
opinions about the topic, or they’re ambivalent because they are only participating because they are getting paid to participate.

 EXAMPLE Suppose you need to gather information on an upcoming election and you ask people to participate in a focus group. In your group, you find that you
have a group in strong support of the Democratic party and you have a group in strong support of the Republican party, and no one in the middle.

To correct this, you decide you’re going to pay participants $20 for their time. Now your group is filled with people who will simply tell you want they think you want
to hear, which invites participation bias.
 TERM TO KNOW

Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be the only participants.

3. Response Bias
Response bias is when people's answers are influenced. Remember the pot of soup analogy? When you get a representative sample, that's like getting a little taste of
everything in the soup. However, things can go wrong and you don't get the right taste of the soup.

Response bias can occur if the wording of the question is unclear to the respondent, if a respondent is uncomfortable due to the sensitive or personal nature of the
questions, or if the respondent feels like the questioner is implying that the question has a "correct" response. That's also called social desirability bias.

IN CONTEXT
On April 20, 1993, the New York Times published an article on a survey conducted by the Roper Organization on behalf of the Jewish American Community about
the soon-to-be opened Holocaust Museum in Washington, DC.

The newspaper reported that 22%, an astounding number of adults surveyed, expressed some doubt as to whether the Holocaust had actually occurred. The
actual question that was presented to people was:

"Does it seem possible, or does it seem impossible to you, that the Nazi extermination of the Jews never happened?"

This seems to be a fairly straightforward question, but there was a big problem with it, and it caused response bias. The problem is that the question contained a
double negative, which are confusing. Saying it is impossible that it never happened is the same as someone saying they are certain that it did happen, but the
question doesn't clearly read that way.

The good thing is that, one year later, the question was revised, and it became clearer. The new question stated:

"Does it seem possible to you that the Nazi extermination of the Jews never happened, or do you feel certain that it happened?"

With this new, clearer question, the question clearly distinguishes between what the two options are--"does it seem possible," or "do you feel certain?" With the
two options clearly defined, less than 2% of individuals were unsure as to whether it was real or not. This provided a more accurate interpretation of what the
American public felt.

Therefore, unclear questions can lead to an inaccurate representation due to response bias. The other scenario in which this can occur is when people will answer a
question because they are either ashamed, or they think that there's a "right" answer that someone is fishing for.

There are certain topics that are particularly sensitive and might make a person want to lie.

Topics that Could Result in a Response Bias

This may result in many people saying they've never used drugs, whether they actually have or not. Even if there's no consequence and the
Drugs
survey is anonymous, they'll still say they've never used drugs when, in fact, they have.

Criminal history Participants might say they don't have one, even if they do.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 69
Sexual behavior This might cover topics of a highly sensitive and personal nature.

Racial prejudice There's an implied right answer; people don't want to say that they're racially prejudiced.

People will report it as being higher than it actually is if they're of low-income status, or even possibly more surprisingly, people will report it as
Income lower than it really is if they're of very high-income status. A lot of people don't want to be showy about their wealth, and so they'll try and come up
with a more reasonable number, in their eyes.
How does this affect what we think about the population? How does this affect the "soup?"

It's like taking a sample of the soup and only tasting the things that you want to taste. Maybe you don't like beans, and so you just sort of ignore the fact that they're in
there. You don't get the overall flavor of what's supposed to happen. It's the same thing with response bias. It doesn't give you the right overall interpretation of what things
the population is supposed to be like.

 TERM TO KNOW

Response Bias
Bias that occurs when either (1) the question is poorly worded so that certain responses are over-represented, or (2) the respondent is confused by the question or
feel like they should lie due to the sensitive nature of the question.

 SUMMARY

Nonresponse bias occurs when people who are selected for the sample can't participate, either because you can't find them, or because they're actively refusing.
The biggest problem is that if you have high rates of nonresponse, it might give you an inaccurate representation of what's going on with your population. You
won't be able to use your sample to draw an inference about your population. Response bias occurs one of two ways: either a respondent doesn't understand the
question and so gives an answer that he wasn't intending; or, the respondent wants to give a supposedly correct answer to the questioner. Both of these can be
inaccurate representations of what actually is the truth about the population. Response bias is a tough thing to get rid of, especially when it is unintentional and
surrounds the wording of the questions.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate in a sample have substantially different opinions than the people who were
included in the sample, resulting in a misrepresentation of the population.

Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be the only participants.

Response Bias
Bias that occurs when either (1) the question is poorly worded so that certain responses are over-represented, or (2) the respondent is confused by the question or feel
like they should lie due to the sensitive nature of the question.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 70
Selection and Deliberate Bias
by Sophia

 WHAT'S COVERED

This tutorial will cover the topics of selection, deliberate, and unintentional bias. These may all impact the selection of the right group of people for your sample, so
it’s very important to be aware of them when attempting to generalize findings. Our discussion breaks down as follows:

1. Selection Bias
2. Random Digit Dialing
3. Deliberate Bias
4. Unintentional Bias

1. Selection Bias
You may recall that sampling is like a pot of soup. Selecting a little bit of each ingredient for the soup is like obtaining a representative sample for an experiment. However,
things can go wrong with the taste test, which may limit the ability to draw conclusions about the pot of soup as a whole.

Selection bias is also called undercoverage bias. It occurs when a significant subset of the population is left out of the sample. This is not necessarily intentional, but rather,
occurs when they were systematically ignored by whoever was taking the sample.

IN CONTEXT
In 2008, almost every poll showed Barack Obama leading by at least five percentage points leading up to the New Hampshire presidential primary. All of these
were based on random digit dialers calling a random sample of New Hampshire households. It was a well-done survey of all accounts.

However, what happened was that Clinton gained some support in the last few days. Mainly, a lot of college students ended up coming out in support of Hillary
Clinton in the last days when people were expecting all college students to come out in support of Obama.

Because a lot of the college students are from out of state, they aren't actually New Hampshire residents. For that reason, they were not counted and, as a result,
the sample got every prediction wrong and Clinton ended up winning.

 TERM TO KNOW

Selection Bias
A bias that results from systematically excluding certain subsets of the population from the sample. It is not necessarily intentional.

2. Random Digit Dialing

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 71
The New Hampshire primary used random digit dialers. Random digit dialing involves using a machine to select random phone numbers from within selected area codes. It
doesn't randomly select the area code necessarily, but once it's in the area code, it can randomly select digits and dial that particular phone number after which the poll can
be conducted.

The biggest advantage of using random digit dialers is that they can reach mobile phones and unlisted numbers that you wouldn't be able to obtain using a phone book.
So, it evens the playing field a bit since anyone can be selected for that sample as long as the phone number is within that particular area code.

 THINK ABOUT IT

How does selection bias affect what we think is in the soup? Imagine that certain ingredients were located only in certain locations in the pot. Maybe noodles sunk to
the bottom. If you tasted only from the top, it doesn't matter how big that taste is. If you missed the noodles, you wouldn't even know they were there. That's the same
as dealing with selection bias. Because you didn't select the representative group of ingredients from the population, you don't get the right idea of what's going on. It
limits your ability to generalize your findings to the general population.
 TERM TO KNOW

Random Digit Dialing


A method of contacting people on the phone. Random numbers are dialed, so this allows researchers to sample people with unlisted phone numbers.

3. Deliberate Bias
Deliberate bias is exactly what it sounds like: it's a bias that's done on purpose. While deliberate bias doesn’t happen very often, it can occur when there's a conflict of
interest between the people performing research and the people funding--who are usually the ones benefiting from--that research.

Typically deliberate bias is motivated by an interest unrelated to the integrity of whatever you’re researching. Most research is done with integrity, but when personal
prestige, the advancement of some ideology, or money get in the way, it’s harder to prove that intentions are pure. Politics can be an industry ripe for deliberate bias.
Perhaps people call with a poll, but the survey includes a leading question to cause the person to respond in a certain way. When this is done it's called “push polling” and
it’s highly suspect.

IN CONTEXT
Deliberate bias can happen in other areas too--even the medical field. Suppose there are two drugs: Drug A and Drug B. The company for Drug B posed the
following leading question:

“If Drug A was linked to cancer, would you be:


more likely to choose Drug B?
less likely to choose Drug B?
equally likely to choose Drug B?”

Based on how this question was posed, Drug B would be more likely to be chosen.

But there’s more. They've put a thought into the participant’s head that Drug A is linked to cancer. Did they ever explicitly say that? No, they said if it was linked to
cancer. However, now they've placed the association in the participant's mind. Subconsciously they're beginning to steer consumers away from Drug A and
towards Drug B.

If a drug company funds a study to determine if it's latest drug is effective, the researchers stand to gain a lot of money and prestige for having tested the drug, if proven
effective. For this reason, they might not be the best choice to test the drug.

IN CONTEXT
An environmental research group is hired by a real estate developer to investigate the effects of a new building. If the results are favorable, they might get another
contract with that real estate developer. If the environmental research group doesn’t come through with a favorable interpretation, another group will, and that
group will get the next contract.

The environmental research group wants to be hired by the developer on another project, so there is a conflict of interest.

 TERM TO KNOW

Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 72
4. Unintentional Bias
Unintentional bias occurs when there is simply an error in the design of the study. Two types of unintentional bias include:

Response bias, which involves the wording of questions or refers to people feeling like they have to lie.
Selection bias, which involves how the sample was selected, such as when people are not included in the selection process, even though they make up a portion of
the population.

Both are simply errors with no hidden agenda. They're not intentional and are not meant to purposely steer the direction of the respondents.

 TERM TO KNOW

Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.

 SUMMARY

Selection bias occurs when some subset of the population is left out. It might be intentional or unintentional. Since some section of the population is left out, the
coverage is lacking, which is why selection bias is also known as “under-coverage”. Random digit dialing is a great tool to use since it helps extend coverage to
mobile phones and unlisted numbers. Most of the time, deliberate bias-- a bias that is done on purpose--is not typically a cause of concern. Sometimes, however,
people with personal interests, like the advancement of an ideology or financial gain, steer results towards outcomes that are favorable to them. Most of the time,
research is done with integrity. When bias does occur, it is accidental, which is called unintentional bias.

Good luck!

 TERMS TO KNOW

Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.

Random Digit Dialing


A method of contacting people on the phone. Random numbers are dialed, so this allows researchers to sample people with unlisted phone numbers.

Selection Bias
A bias that results from systematically excluding certain subsets of the population from the sample. It is not necessarily intentional.

Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 73
Convenience & Self-Selected Samples
by Sophia

 WHAT'S COVERED

This lesson will explain two types of samples: convenience and self-selected samples. Our discussion breaks down as follows:

1. Representative Samples
2. Non-Representative Samples
a. Convenience Samples
b. Self-selected Samples

1. Representative Samples
One of the things that we know about sampling is that it's important for samples to be representative of the population, also known as a representative sample. What we
mean by that is when we take our sample--which is a subset of a larger population--we want this sample to behave just like the population would if we sampled them all.

 DID YOU KNOW

Now, sampling everybody is not a sample at all; that’s called a census.


We want the sample to behave as similar to the population as possible so that when we calculate statistics from our data, the statistics are as accurate about the
population as they can be.

 BIG IDEA

The sample should represent the group/population at large, so it’s important individuals are selected carefully for the sample. That way, accurate information will be
gained and can be used to describe the group/population at large.
The goal is to generalize what is found in the sample and apply it to the people outside of the box, or the population.

 TERM TO KNOW

Representative Sample
A sample that accurately reflects the population.

2. Non-Representative Samples
The two methods analyzed in this tutorial have major flaws--these two designs do not result in representative samples. They are conducted often, so it’s important for you
to recognize them.

2a. Convenience Samples


A convenience sample that is easily obtained is not valid because people in similar locations often feel the same way.

IN CONTEXT
Suppose there is a crowd of people at a mall and there is one guy with a clipboard, and he wants some data. He might take the people nearest to him, and say,
“Hey, would you like to take my survey, please?”

The people he asks might be representative of the population, but they might not. They all simply happen to be at the same place at the same time. This means
they might have some similarities that could make them not representative of the larger population. The risk of them not representing the group/population at
large is too high.

 EXAMPLE If you ask people about their spending habits, and they all happen to be shopping in the headphones section, that probably means they have similar
ideas about how they should spend their money.
 TERM TO KNOW

Convenience Sample
A sample that is easily obtained. It is often not representative of the population.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 74
2b. Self-selected Samples
Next, let's discuss self-selected samples, which are also called voluntary response samples. These are samples where people can choose to participate.

 EXAMPLE Focus groups are a common example of self-selected samples.


Participants who feel very strongly about the subject at hand are likely to be the volunteer for the self-selected sample. On the other end of the spectrum, participants may
be compensated for their time and may simply tell the interviewer what they want to hear.

 EXAMPLE If your focus group is about politics, you might get only the very, very liberal people or the very, very conservative people. You might get the most
extreme viewpoints but none of the viewpoints in the middle. Or, there are also a lot of people who are ambivalent about politics. They don't really care, but they
want to get paid if this is a sample that offers compensation or another type of reward like free lunch.
 TERM TO KNOW

Self-Selected (Voluntary Response) Sample


A sample that the participants choose to be a part of.

 SUMMARY

Representative samples are important if we want to accurately generalize our findings to the population. Convenience samples render people who are simply in
the vicinity and happen to be at the same place at the same time. Self-selected samples are also called “voluntary response” and tend to elicit either strong
opinions or no opinion at all.

Good luck!

 TERMS TO KNOW

Convenience Sample
A sample that is easily obtained. It is often not representative of the population.

Representative Sample
A sample that accurately reflects the population.

Self-Selected (Voluntary Response) Sample


A sample that the participants choose to be a part of.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 75
Random and Systematic Errors
by Sophia

 WHAT'S COVERED

This tutorial will compare random errors vs. systematic errors. Our discussion breaks down as follows:

1. Random Errors
2. Systematic Errors

1. Random Errors
Random errors are exactly that: random. They can simply occur through no fault of the person taking the sample. When a sample is taken from a larger population, the
results are unknown, meaning that it’s unclear if the results will accurately represent exactly what the population looks like.

IN CONTEXT
Suppose there were 100 individuals, which we will consider the population. Twenty of them were college students. You select 5 people out of the overall 100 for a
sample. What would you expect to happen?

You would expect that twenty percent of the population are college students, which is one out of every 5 people. So you would probably expect one individual
within your sample of 5 people to be a college student.

However, that doesn't always happen. You might not get any college students, or all five of them may be college students. Just because you expect to get one
doesn't mean that will actually happen. Why not?

Let’s say that the individuals with numbers 1 - 20 are the college students. Numbers 21 - 100 are individuals not in college. Using a random number generator,
you might get a simple random sample that looks like this:

Sample Percentage

85, 27, 17, 94, 74 1 of 5, or 20%

One out of five of those is a college student, which is 20%.

Another simple random sample might look like this:

Sample Percentage

72, 92, 45, 20, 38 1 of 5, or 20%

Again, one out of five is a college student.

However, you might get a simple random sample that looks like this:

Sample Percentage

46, 5, 83, 26, 20 2 of 5, or 40%

Here, the second person, number 5, and the fifth person, number 20, are college students, out of 100 individuals in the population. That’s 40%. What went
wrong? Nothing went wrong--it’s just that random errors happen sometimes.

Random error occurs when the sample, just by chance, doesn't match up perfectly with the population. Random error is not a mistake that is correctable; it is simply
something that happens when sampling randomly. While it can’t be corrected or avoided completely, the impact can be minimized by increasing the sample size or by

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 76
taking multiple samples of equal size. The larger the group, the better the chances are that a representative group will be obtained.

 EXAMPLE Recall the example from above. Suppose that ten individuals from the group of 100 were chosen instead of five. Two college students would be
expected to make it into the sample. So, if the sample was off by one, it reduces the impact since at least one college student would be represented.
 TERM TO KNOW

Random Error
When the resulting value obtained from the sample does not match the value from the population simply by chance. This is not a mistake, but is inherent in the
variability in sampling.

2. Systematic Errors
Now, by contrast, systematic errors are mistakes. Systematic errors are due to flaws in the design.

IN CONTEXT
Suppose a school board wants to estimate how many students are eligible for free or reduced lunch. If you have an under-coverage bias, or selection bias, your
sample may include people from a poorer neighborhood that didn't respond to a questionnaire that was sent out. Perhaps their parents were working nights and
didn’t have time to complete the survey.

Therefore, the board may underestimate the true number of students requiring free and reduced lunch. This type of error cannot be remedied by increasing the
sample size.

 EXAMPLE A child has a growth chart in his room and his parents mistakenly put it up above the baseboard--an extra 2 inches from the floor. This is going to
result in the child thinking he’s 2 inches shorter than he actually is, an example of measurement bias, which is systematically wrong.
 TERMS TO KNOW

Systematic Error
When the resulting value obtained from the sample does not match the value from the population as a result of an incorrect measurement or bias. This is a mistake
made by the researcher.

Selection Bias
A bias that occurs when certain groups are systematically left out of the sample. This is a systematic error.

Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.

 SUMMARY

Random errors occur when the sample selected doesn't match up with the population. It cannot be controlled, but using a larger sample will lessen the effect.
Conversely, systematic errors result in wrong answers or wrong values in your sample, due to some kind of bias or error with your measurement. Increasing the
sample size will not fix the issue. When a systematic error occurs, you might as well just start over, because there's no rescuing poorly collected data!

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.

Random Error
When the resulting value obtained from the sample does not match the value from the population simply by chance. This is not a mistake, but is inherent in the variability
in sampling.

Selection Bias
A bias that occurs when certain groups are systematically left out of the sample. This is a systematic error.

Systematic Error
When the resulting value obtained from the sample does not match the value from the population as a result of an incorrect measurement or bias. This is a mistake made
by the researcher.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 77
Margin of Error
by Sophia

 WHAT'S COVERED

This tutorial will explain margin of error by focusing specifically on:

1. Margin of Error
2. Confidence Interval

1. Margin of Error
You may have seen something in your local newspaper stating that, for example, a political candidate leads the field by 5%, and that there is a 3% margin of error in the
poll. What does this mean?

When surveys are done, collecting the right amount of data is important to ensure the answer is correct. Samples are often reported with something called a margin of
error, meaning that the results may be off by a little bit, though it can be estimated by how much. It explains to the reader that the right answer is not 100% accurate, but it
is a close estimate.

IN CONTEXT
Suppose you are an administrator of a school and you need to determine the overall percentage of left-handed students. Maybe 10% of students in the school
are left-handed, but when you take a sample, even though you were diligent about the way data was collected, you got 8%. The answer was not accurate. What
happened?

It's possible that the data obtained was not exactly the same as what the population would have obtained. Maybe only 8% of left-handed people were in the
sample, even though the population actually contains 10% who are left-handed. You didn't do anything wrong, but samples might be inherently off the mark due to
the random selection process.

 TERMS TO KNOW

Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the population.

Estimate
The mean value obtained from the sample. If the sample was well-collected, the estimate should be reasonably close to the true value.

2. Confidence Interval
The confidence interval uses both the estimate and margin of error. When we combine these two parts, it gives us a range of possible values that our estimate can be.

This confidence level tells us how sure we are that our interval contains the actual population value or how sure we are that our sample falls in that range.

IN CONTEXT
Suppose a newspaper polled 500 voters and 48% responded that they were going to vote for Candidate X in the upcoming election. The newspaper might print a
margin of error along with that 48% mark; perhaps they use four percentage points as their margin of error. It's not particularly important how this 4% was
calculated, but it is important to note that a margin of error was reported along with the percent value.

What does this 4% margin of error mean? It means the researchers are pretty confident that the true amount of people that will vote for Candidate X is within 4%
of 48, which means that it could be as low as 44%, or as high as 52%, or anywhere in between. This idea of creating some wiggle room on either side of 48% is
the confidence interval.

Suppose on election day, 46% of the people voted for Candidate X. Since this falls into the range of 44% to 52%, it is a close enough estimate to the right answer.

 THINK ABOUT IT

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 78
What happens to the margin of error as the sample size increases? Will the margin of error go up, down, or stay about the same?
As the sample size goes up, the margin of error goes down because a larger sample size gives a more accurate portrait of the population. What’s happening is that you
cast a wider net to include people that may be closer to representing the actual population.

If you had a sample size of 4 people and you want to generalize the findings to a population of 200 people, it’s unlikely that just those four people have enough of the
characteristics to represent the population.

However, when the sample size is increased, you get closer to achieving a representative sample, which means the confidence interval can be lower; in other words, the
higher the sample size, the less wiggle is needed room on each side of the measurement.

 TERM TO KNOW

Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and subtracting the margin of error from the value in the sample.

 SUMMARY

Most statistical results are reported alongside a margin of error, which is an amount by which the sample's mean may deviate from the true mean of the population.
If the data is well-collected, then it's likely that the true population value is within the confidence interval created by the reported value, plus or minus the margin of
error. It's a bad idea to compare two values within the same confidence interval since both would be accurate enough to be correct. That would be a statistical
dead heat.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and subtracting the margin of error from sample mean.

Estimate
The mean value obtained from the sample. If the sample was well-collected, the estimate should be reasonably close to the true value.

Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the population.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 79
Terms to Know
Absolute Change
The raw increase or decrease in the value of a variable

Accuracy
The extent to which the values, when considered all together, center around the correct value for a variable.

Available Data
Data collected by some other entity - a government organization or private company.

Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased data can often favor a specific group of those
studied.

Binomial Question Type


A question that will yield categorical data with just two possible values.

Blinding
The practice of making sure that certain individuals do not know which subjects are receiving which treatment.

Census
Using the entire population to obtain data

Closed Question
A question type with only so many different answer choices.

Cluster Sample
A sampling method where the population is separated into groups, typically geographically, and a random selection of clusters is made. Each
individual in the cluster becomes part of the sample.

Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being together in one place, making the individuals easier to
sample together.

Completely Randomized Design


An experimental design where the assignment of subjects to treatments is done entirely at random.

Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and subtracting the margin of error from sample mean.

Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential effects of some other variable which was unaccounted
for.

Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study can draw.

Consumer Price Index


An index published by the US Bureau of Labor Statistics that shows the change in the price of many different goods or services in the United
States. It provides a measure of purchasing power.

Continuous Data
Data that can take any value within an interval.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 80
Control
The principle of experimental design that requires that other variables which may confound the experiment be held constant between the
treatment groups, so that any differences in the groups can be attributed to the different treatments.

Convenience Sample
A sample that is easily obtained. It is often not representative of the population.

Data
Information used in a study to answer a statistical question

Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.

Descriptive statistics
Using only the information at hand to describe the selected group of individuals

Discrete Data
Data that can only take so many different values.

Double-Blind Experiment
An experiment where neither the subjects, nor anyone in contact with them, has any knowledge of which subjects are receiving which
treatment.

Estimate
The mean value obtained from the sample. If the sample was well-collected, the estimate should be reasonably close to the true value.

Experiment
A type of study where researchers impose treatments on the participants or experimental units.

Experimental Design
The way in which an experiment is carried out. A good design has key elements of randomization, replication, and control.

Experimental Unit
An animal or thing involved in an experiment.

Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will correspond to an increase or decrease in some other
variable.

Hawthorne Effect
People have the tendency to change their behavior when they know they are being monitored.

Index Number
A way to measure the relative change in a value, usually the price of a good or service, over time. If the index number is over 100, that means
the price has increased. If the price has decreased, then the index number will be less than 100.

Inferential statistics
Using the information at hand to make a larger, more general statement about the entire population of individuals

Inflation
A relative increase in the price of a good or service over time. A person will need to pay more to receive the same good or service than they did
at a previous point in time.

Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the population.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 81
Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that could affect the outcome of the experiment are paired
together, then one of them is assigned to one treatment and one is assigned to the control. This can also be done by assigning each subject to
both treatments, where each subject acts as their own matched-pair.

Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.

Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random sampling, and simple random sampling. It "zooms in" on
smaller areas to sample so that sampling becomes more feasible.

Nominal Data
Categorical data with qualities that cannot be ordered or ranked.

Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate in a sample have substantially different opinions
than the people who were included in the sample, resulting in a misrepresentation of the population.

Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or outcomes in any way.

Open Question
A question type with no answer choices; the respondent can choose what he or she wants to say to answer the question.

Ordinal Data
Categorical data with qualities that can be ordered or ranked.

Participation (Voluntary Response) Bias


Bias that occurs when a sample consists entirely of volunteers. People with strong opinions may be the only ones who volunteer.

Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be the only participants.

Percent Change
A relative increase or decrease in a percent value

Percentage Points
An absolute increase or decrease in a percent value.

Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.

Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even when taking a placebo, which contains no active
ingredient.

Population
The entire set of individuals from which to sample

Precision
The extent to which the values are very close to each other, even if they are not near the correct value.

Probability Sampling Plan


The way to collect a random sample that guarantees a certain likelihood for each member of the population to be selected

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 82
Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they go into the future.

Publication Bias
The desire of researchers (and research publications) to only print the most sensational or interesting articles.

Qualitative (Categorical) Data


Data that describes. It can't be measured or used for arithmetic.

Quantitative (Numerical) Data


Data that is numerical. It can be measured and it can be used for arithmetic. .

Random Digit Dialing


A method of contacting people on the phone. Random numbers are dialed, so this allows researchers to sample people with unlisted phone
numbers.

Random Error
When the resulting value obtained from the sample does not match the value from the population simply by chance. This is not a mistake, but is
inherent in the variability in sampling.

Random Number Generator


A method of collecting a sample that utilizes technology to select random numbers corresponding to individuals in the population

Random Number Table


A method of collecting a sample to select random numbers corresponding to individuals in the population. Each individual is assigned a number,
which are then selected from the table.

Random Sample
A sample that has been selected in a manner where every member of the population has some predetermined chance of being selected for the
sample

Random Selection
The method of obtaining a random sample

Randomization
The principle of experimental design that requires that the subjects/experimental units be assigned to groups using some random process. This
ensures that the two groups are roughly equal prior to assigning treatments.

Randomized Block Design


An experimental design where the subjects are separated into homogenous groups, called blocks, based on some variable we think may affect
the outcome of the experiment. We then run the experiment separately within each block.

Raw Data
Unorganized, unprocessed and not summarized.. Typically, this is data that is not already available

Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.

Relative Change
The percent increase or decrease in the value of a variable.

Replication
Repeating the experiment on multiple subjects/experimental units. This principle of experimental design that states that a larger experiment with
more subjects/experimental units will allow us to more clearly see differences between the treatments.

Representative Sample

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 83
A sample that accurately reflects the population

Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due to the sensitive nature of the question.

Response Variable
A variable that is affected by the explanatory variable.

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they became the way they are in the present.

Sample/Sampling
A subset of the population. There are many ways to select a sample.

Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically excluded.

Self-Selected (Voluntary Response) Sample


A sample that the participants choose to be a part of.

Simple Random Sample


A method of selection that guarantees that every sample of a certain size has an equal chance of being the selected sample

Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving which treatment, or people in contact with the
subjects have no knowledge of which subjects are receiving which treatment, but not both.

Statistical analysis
All the ways of collecting, analyzing, and interpreting the data

Statistical study
A way to collect information from individuals

Statistics
The study of collecting, analyzing, interpreting, and presenting information

Stratified Random Sample


A random sampling method where individuals are separated into homogenous groups, then simple random samples are taken within each
group.

Stratum/Strata
The homogenous groups in a stratified random sample. All individuals in each stratum have something in common, and we would like to see
how that affects the outcome of the sample.

Subjects/Participants
The people or things being examined in an observational study.

Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.

Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.

Systematic Error
When the resulting value obtained from the sample does not match the value from the population as a result of an incorrect measurement or
bias. This is a mistake made by the researcher.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 84
Systematic Random Sample
A sampling method where every "k"th individual is selected for the sample (e.g. every 2nd, 4th, 20th individual)

Treatment
Something the researchers administer to the subjects or experimental units.

Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.

Variable
Any attribute or number that can be measured about individuals in a study.

Variable of Interest
Any variable which we need to know about in the context of a study.

Variables of Interest
The variables the survey wishes to measure about those taking the survey.

Formulas to Know
Absolute Change

Index Number

Relative Change

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 85

You might also like