Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

EPPP Statistics and Research Design

In this lecture, we’ll go over Statistics and Research Design – probably not everyone’s favorite subject,
but nonetheless a crucial topic for not only conducting sound research, but also for being a good
consumer of the research literature. On the EPPP, statistics and research design only accounts for seven
percent of the content or roughly 15 exam questions. Today, we’ll be reviewing some of the topics that
you're most likely to encounter on the exam. It’s best if you familiarize yourself with the written study
materials first as they are more in-depth and will provide you a solid point of reference for the concepts
we’re reviewing. Let’s get started.

When we talk about research methods, we are really referring to the framework of what we’re going to
study and how we plan to study it. There are two broad categories of research methods: quantitative
and qualitative. Quantitative research is primarily a deductive approach to learning about various
phenomena; in other words, it works from the top-down. Quantitative research focuses on prediction
and using empirical methods and statistical procedures to assess differences, and it can be further
broken down into experimental or non-experimental research. Non-experimental research, or
descriptive research, focuses on collecting data on variables to understand their relationships rather
than manipulating variables and testing hypotheses about their impact on one another. Correlational
research, archival research, and surveys are typically non-experimental. Experimental research,
however, is conducted to test hypotheses about the effects of one or more independent variables on
one or more dependent variables. Experimental studies are characterized by the ability to randomize
subjects into treatment and control groups; this helps control for variables that are not explicitly
included in the study, which can strengthen our results. Only true experimental research provides the
amount of control necessary to conclude that observed variability in the dependent variable is caused
by variability in the independent variable. Qualitative research, on the other hand, is an inductive
approach. Working from the bottom-up, researchers identify an area of interest, collect and analyze
non-numeric data, and ultimately develop a theory from the information gathered. Qualitative research
emphasizes understanding and interpretation, often focusing on the “how” or “why” of a phenomenon.
This is typically achieved through listening to others, observing behavior, or examining records. Specific
strategies may include ethnographic and phenomenological research, focus groups, content analysis,
interviews, case studies, and narratives. Both are solid research methods with their own strengths and
weaknesses, but the EPPP places emphasis on quantitative research methods, which is the majority of
what we’ll review today.

With quantitative research, it’s important to know the difference between a population and a sample.
A population is the entire group of people who have the characteristic of interest, and a sample is a
subset of that population. For example, if we want to determine the effects of a newly developed brief
cognitive intervention on the impulsive behavior of children with ADHD, the population of interest
would likely be all children with ADHD. However, if we wanted to test the effects of the intervention, it
would be impossible to apply it to the entire population. Instead, we'd obtain a sample of children from
the population. This sample is presumed to be representative of the population.

This leads us into generalizability. Generalizability is the degree to which findings from a sample can be
attributed to the larger population from which the sample was taken. This is often achieved through
random sampling, which we will discuss shortly. Even though this is a principle method to achieve
generalizability, random sampling does not guarantee it. For instance, if you were select a small target
population from which to draw your sample, the results may be attributed to the small target
population, but not the larger overall population. If you were to take a random sample of adults who
were diagnosed with schizophrenia, you could then ascribe the results to that population of adults who
have been diagnosed with schizophrenia, but you could not extrapolate the research findings to the
EPPP Statistics and Research Design

larger population of all adults. Good generalizability is achieved when the results of a study are broadly
applicable to many different types of people or situations. A larger sample size and diversity among
participants will help ensure a more representative distribution of the population is included in your
sample, and a large enough sample can usually be considered representative of groups of people to
whom results will be generalized.

Another factor that can impact design and generalizability is the sampling technique. There are three
main types of sampling used in quantitative research: simple random sampling, stratified random
sampling, and cluster sampling. Let’s look at sampling strategies that are frequently used to create a
sample that is more representative of the population and ultimately maximize generalizability.

Simple random sampling occurs when every member in the population has an equal chance of being
included in the sample, and the selection of one member has no effect on the selection of another. For
example, if we put 25 names into a hat and select one, that person had a 1 in 25 chance of being
selected. In order for the next person being selected to also have a 1 in 25 chance, the name of the
person pulled should go back into the hat or be replaced with another person’s name. Random sampling
reduced the probability that a sample will be biased in some way, which is especially true when the
sample is large.

Stratified random sampling takes the concept of random sampling noted previously and applies it
equally across levels of a group or strata such as age, ethnicity, or socioeconomic status. For example, in
an effort to assess outcomes across classes (freshman, sophomore, junior, and senior), we will want to
have equal representation across groups. To do this, we randomly sample within each group or strata,
and we would take 100 randomly sampled freshmen, 100 randomly sampled sophomores, 100 randomly
sampled juniors, and 100 randomly sampled seniors to create a total sample size of 400 students.

Cluster sampling is when simple or stratified random sampling is applied to select groups or clusters of
individuals rather than to individuals, and we either include all individuals in the selected units or
randomly select individuals from each unit. (The latter is multistage cluster sampling.) This is a useful
strategy when you can’t get access to the entire population of interest. For example, rather than finding
everyone in the country with autism, you identify 10 clinics specializing in services for individuals with
autism and randomly select from those clinics.

It’s important to note the difference between random sampling and random assignment. These
concepts are often confused with one another and used interchangeably; however, random selection or
sampling refers to the ways subjects are sampled from the population and helps to increase
generalizability from the sample to the population. Random assignment is the way in which subjects are
assigned to different levels of the independent variable (IV) and helps the investigator improve certainty
that the observed effects on the dependent variable (DV) were produced by the IV. It also distinguishes
true experimental research from quasi-experimental research.

Another area of consideration in research design is internal validity, which is the degree to which our
evidence supports the claim that the experimental condition/treatment is making a difference or not.
We can enhance internal validity by using strong research methods that focus on accuracy, such as using
sufficient sample sizes with good representation of the population, minimizing systematic errors, and
controlling for extraneous variables that may confound the results. Confounding variables are variables
to which changes in an outcome or dependent variable can be attributed, rather than the independent
variable that you are interested in. This can lead to incorrect conclusions about the nature of the
EPPP Statistics and Research Design

relationship between independent and dependent variables. Some common threats to a study’s internal
validity include attrition, testing, selection, interactions with selection, history effects, instrumentation,
maturation, and regression. Let’s take a closer look.

Attrition occurs when participants drop out or leaves a study, which means that the results are based on
a biased sample of only the people who decide to participate. History effects arise when historical
events change the outcomes of studies that are conducted over time. For example, a natural disaster or
tragic event can affect how participants act and respond. Effects of history can be minimized through
having more than one group and random assignment to groups. Maturation is the impact of time as a
variable in a study. Studies that take place over a period of time are subject to effects that occur during
that time, namely that the participants naturally grow, change, and mature over time, and these
changes may alter how they respond. This is often confused with history effects, but it is important to
note that maturation occurs within the individual as a result of natural growth, development, or aging,
while history effects are external events that happen in the individual’s environment.

Instrumentation is the impact of the test design itself on how participants respond. Certain questions or
sections may prime a participant to respond in a way that is different than they would have otherwise. A
rater may also improve their accuracy between pre- and post-test, which may erroneously suggest the
independent variable had an effect. Testing is when participants are repeatedly tested using the same
measures. Giving someone the same test several times means they are more likely to do better as they
learn the test or testing process, not because of the applied intervention. This can be addressed through
administering the DV measure only once as a post-test.

Statistical regression, or regression toward the mean, is the tendency of extreme pre-test scores, high or
low, to revert back toward the population mean. When individuals are selected for program
participation based on extreme pre-test results, their post-test scores tend to shift toward the mean
score, regardless of the efficacy of the program. Avoid selecting participants based on extreme
performance or scores.

Selection refers to the method used to assign participants to treatment groups. This becomes an issue
when intact groups are used or there are systematic differences between the groups before applying
any other interventions. To minimize this risk, random assignment to groups is the best method; if this
isn’t possible, you should administer a pre-test to participants to determine if the groups differ with
regard to the DV from the beginning. Interactions with selection are when groups are not equal from the
start and there is an interaction with some other threat to internal validity, such as unequal groups and
a history effect, that impacts a participant’s scores on the DV.

External validity refers to the degree to which the results of the treatment or condition outcomes can be
generalized to a larger population, a different population, or another context. Factors that can threaten
external validity include interactions between testing and treatment, interaction between selection and
treatment, reactivity, and multiple treatment interference. When these threats to external validity
occur, it is difficult to generalize results to the larger population and limits generalization to only those
who have had a similar experience, unless efforts have been made to minimize these threats.

Let’s move on to independent and dependent variables. When you design a research project, the first
thing you typically identify is what you’re going to study. This directly relates to your variables of
interest, which are the independent and dependent variables. Independent variables are the groups we
place participants in to examine differences. These groups may be things we can change or manipulate,
EPPP Statistics and Research Design

like a treatment condition, or things we can’t change, like a characteristic or trait of an individual such as
age, ethnicity, or sense of belonging. Independent variables must have at least two levels because if we
are evaluating the effects of something, it must be compared to something else, which is another level
of the same independent variable. So, if we are evaluating effectiveness of a treatment condition, we
have to have at least two treatment conditions to compare. Dependent variables, on the other hand, are
the outcomes being measured, such as scores on a standardized measure of anxiety, reaction time, or
number of tasks completed. They are an attribute, behavior, or other outcome that the research hopes
to change by applying the independent variable. In research, we hope that the independent variable, or
IV, has an effect on the dependent variable, or DV.

Let’s take a closer look. A researcher examining the effects of three treatment approaches (EMDR,
trauma focused CBT, and play therapy) on scores on the Trauma Symptom Checklist for Children assigns
participants to one of three treatment groups. These treatment groups are the independent variable –
more specifically, they are three different levels of one independent variable, which is treatment
condition. The participants complete a symptom inventory before and after treatment. The scores on
the symptom inventory are the dependent variable, or DV, because they are assessing for any changes
in symptoms as a result of applying treatment, the IV. A trick to identifying the IV and DV in a study is to
look at the title of the study. A study titled “The effects of Cognitive Therapy and Cognitive Behavior
Therapy on the impulsive behavior of children with ADHD” indicates that the type of therapy is the IV
and impulsive behavior is the DV. If a question on the exam requires you to identify a study’s
independent and dependent variables, try putting the information into the format of the title of a
research study. When you do so, the variable following the words “the effects of” will be the
independent variable, while the variable following the word “on” will be the dependent variable.

Now that you can identify variables, it’s important to know how variables can be measured, which will
ultimately help us decide which type of test to run. In order to measure different variables, we have to
look at scales of measurement, referred to with the acronym NOIR, or nominal, ordinal, interval, and
ratio. These four scales of measurement are ordered in terms of their level of mathematical complexity.

The nominal scale is the least mathematically complex and is described in terms of discrete, qualitative
categories, such as gender, ethnicity, or even assigned group numbers. These data are analyzed in terms
of the number or frequency of cases in each category. For example, 20 adults and 25 children. These
numbers cannot be aggregated to produce a mean, nor can they be used to place individuals in an
order; rather, they are simply represented as a count or tally for each group.

The second type of measurement scale is the ordinal scale. When using an ordinal scale, people can be
rank-ordered based on their status or score on the variable being measured. For example, if our variable
is satisfaction with therapy and we use a Likert scale to measure satisfaction, we're using an ordinal
scale of measurement. If ratings on our scale range from 1 to 5, with 1 indicating low satisfaction and 5
indicating high satisfaction, we can conclude that a therapy client who rates their level of satisfaction at
3 is less satisfied than a client who rates their satisfaction at 4. A limitation of the ordinal scale is that we
can't assume there are equal intervals between scale values on our Likert scale. We can't conclude that
the difference between a rating of 1 and a rating of 2 means exactly the same thing in terms of amount
of satisfaction as the difference between a rating of 2 and a rating of 3. As with nominal data, group
means cannot be calculated with ordinal data.

The next is the interval scale, which has the property of order and, as its name implies, has equal
intervals between adjacent points on the measurement scale. Many of the tests used in psychology
EPPP Statistics and Research Design

provide scores on an interval scale. For instance, scores on standardized IQ tests are considered to be
interval scores, and we can assume that the 10-point difference between IQ scores of 90 and 100 is
equal to the 10-point difference between IQ scores of 100 and 110. Group means can be calculated from
interval data.

Finally, the ratio scale is the most mathematically complex measurement scale as it has the properties of
order, equal intervals, and an absolute zero point. This means that a zero on the scale indicates an
absolute absence or lack of the characteristic being measured, such as weight or money. For example, if
we're assessing the effects of a smoking cessation program by measuring number of cigarettes smoked
by participants at the end of treatment, a value of zero indicates that no cigarettes were smoked.

It's important for you to be familiar with the four scales of measurement because it's highly likely that
the exam will include questions that require you to select the appropriate inferential statistical test for a
specific research study. In order to do this, you'll have to first identify the scale of measurement of the
data being analyzed. Now that we have identified types and levels of variables, we need to look at the
predictions we make about our variables, also known as hypotheses.

Remember, when conducting experimental research, researchers test hypotheses about the
relationships between independent and dependent variables. When we conduct a study to test a
hypothesis about the relationship between independent and dependent variables, we translate the
verbal hypothesis into two statistical hypotheses: a null hypothesis and an alternative hypothesis. The
null hypothesis implies that the independent variable does not have any effect on the dependent
variable, or there are no differences between groups. According to the null hypothesis, any observed
effect of an independent variable in a sample is simply the result of chance or random factors, which are
typically referred to as sampling error. In other words, the null hypothesis predicts that the independent
variable has no effect on the dependent variable and that any observed effect is due to sampling error.
The alternative hypothesis states the opposite of the null hypothesis and implies that the independent
variable does have an effect on the dependent variable and there are differences. In other words, the
alternative hypothesis predicts that any observed effect is not just the result of chance or sampling
error, but is most likely due to the independent variable. The alternative hypothesis is typically
consistent with the researcher’s verbal hypothesis since it states that there is a relationship between the
independent and dependent variables.

Note that the null and alternative hypotheses are always stated in terms of population values. In the
study on the effects of a cognitive intervention on impulsivity, the null hypothesis would be stated in a
way that implies that the population of treated children is no different from the population of untreated
children because the cognitive intervention has no effect on impulsive behavior. In contrast, the
alternative hypothesis would be stated in a way that implies that the populations of treated and
untreated children are actually different because the intervention actually does have an effect. Because
researchers don't have access to the entire population, they cannot actually prove or disprove
hypotheses about population values. Instead, researchers can only determine if hypotheses about
population values are likely given the value obtained from a sample drawn from that population.

To determine whether observed effects on a dependent variable are due to sampling error or the
independent variable, researchers use an inferential statistical test such as a t-test or analysis of
variance, both of which we’ll review later. When the results of the statistical test indicate that the
observed effects are due to sampling error, meaning there are no differences between groups, the
researcher retains the null hypothesis and rejects the alternative hypothesis. However, when the results
EPPP Statistics and Research Design

of the statistical test indicate that the observed defects were not due to random chance or sampling
error, the researcher rejects the null hypothesis and retains the alternative hypothesis.

When using an inferential statistical test to make a decision about the null hypothesis, we never know
for sure whether the decision we made is correct or incorrect, and in any research study there are two
possible correct decisions and two possible incorrect decisions. We can either retain a true null
hypothesis or reject a false null hypothesis. When we retain a true null hypothesis, we correctly
conclude that an independent variable has had no effect on the dependent variable and that any
observed effect was due to sampling error. Conversely, when we reject a false null hypothesis, we
correctly conclude that the independent variable does have an effect on the dependent variable and
that the observed effect is not the result of sampling error. The ability of a statistical test to allow us to
state that there is an effect, or correctly reject a false null hypothesis, is referred to as statistical power.
An inferential test has power when it allows us to reject a false null hypothesis; in other words, the more
power a test has, the more likely we are able to detect the effects of an intervention. On the exam, you
might be asked for the definition of power or about the factors that affect it. As noted in the written
study materials, power is affected by several factors. For instance, the power of a statistical test is
greater the larger the sample size and the larger the magnitude of the effect of the independent
variable, which makes us more likely to detect the effects of the intervention. The power of a statistical
test is also affected by the level of significance, which is also known as alpha. A larger level of
significance makes it easier to reject the null hypothesis, and if the null hypothesis is actually false, then
we have greater statistical power.

The incorrect decisions are discussed in terms of Type I and Type II error. Refer to the chart in your
written study materials to assist you with visually breaking down these concepts. The Type I error is
similar to the concept of a false positive. We make a Type I error when we reject a true null hypothesis
by concluding that an independent variable has had an effect on the dependent variable, but the
observed effect was actually due to sampling error. We are basically saying that differences exist when,
in fact, they do not.

The other type of decision error is called a Type II error. It occurs when we retain a false null hypothesis
and is similar to the concept of a false negative or saying that differences do not exist when they actually
do. With Type II error, we conclude that an independent variable has had no effect on the dependent
variable when, in fact, the independent variable did have an effect that we were unable to detect
because our sample was too small, we didn't administer the treatment for a long enough period of time,
or for some other reason that limited statistical power. The probability of making a Type II error is
represented by beta, and you cannot directly measure beta.

Unfortunately, we never know for sure if we've made a correct or incorrect decision about the null
hypothesis; however, we do have indirect control over statistical power, or the ability to reject a false
hypothesis. We also have direct control over the probability of making a Type I error since that
probability is equal to alpha, which we set prior to analyzing the data we collect in a research study.

In a study examining the effects of a cognitive intervention on symptoms of impulsivity in children with
ADHD, our verbal hypothesis would be that the intervention will decrease the impulsivity of children
with ADHD. If the main score on the measure of impulsivity is 50 for the population of untreated
children, the null hypothesis would be that the population mean equals 50, while the alternative
hypothesis would be that the population mean is less than 50. The next step is to specify the acceptable
degree of risk for a Type I error, or the risk for rejecting a true null hypothesis. This risk is represented as
EPPP Statistics and Research Design

alpha. For our example, let's assume that we set alpha at .05, which means that if we reject the null
hypothesis there is a five percent chance that we've made a Type I error.

In most psychological research, alpha is set at .01 or .05. When alpha is set at .01, there's a one percent
chance that you will make a Type I error. And when alpha is set it .05, there's a five percent chance that
you will make a Type I error. In other words, the larger the size of alpha, the easier it is to reject the null
hypothesis, and if the null hypothesis is actually true, the greater the chance of making a Type I error.

OK, let's try a few practice questions. Please replay or pause the question to allow time to think through
the options. Here’s the question.

A researcher has made a Type I Error when they:


A) incorrectly retain a false null hypothesis.
B) incorrectly retain a true null hypothesis.
C) incorrectly reject a false hypothesis.
D) incorrectly reject a true null hypothesis.

I’ll repeat the question (PAUSE).

A researcher has made a Type I Error when they:


A) incorrectly retain a false null hypothesis.
B) incorrectly retain a true null hypothesis.
C) incorrectly reject a false hypothesis.
D) incorrectly reject a true null hypothesis.

This question requires you to know the definition of a Type I error. Answer A says that a Type I error
occurs when a researcher incorrectly retains a false null hypothesis. Retaining a false hypothesis is an
incorrect decision, and it occurs when a researcher concludes that the independent variable has had no
effect on the dependent variable when it actually did. This kind of decision error is actually called a Type
II error, not a Type I error, so answer A isn't the correct response. Answer B says that a Type I error
occurs when a researcher incorrectly retains a true null hypothesis. This can't be the correct answer
because retaining a true null hypothesis would be correct, not an incorrect decision. In other words, it's
impossible to incorrectly retain a true null hypothesis. Answer C says that a Type I error occurs when a
researcher incorrectly rejects a false null hypothesis. Rejecting a false null hypothesis is another type of
correct decision, so we can also eliminate this answer. Finally, answer D says that a Type I error occurs
when a researcher incorrectly rejects a true null hypothesis; rejecting a true null hypothesis is an
incorrect decision, and it's the definition of a Type I error, so answer D is the correct response. A
researcher has made a Type I error when they incorrectly reject a true null hypothesis.

To simplify answering these types of questions, you should memorize the decision outcome table
provided in the written study materials so that you can draw the table on your white board at the test
site. Having the table to refer to will make it easier to identify the correct answer to questions on Type I
and Type II errors. Let's look at another question.

To increase statistical power, a researcher would:


A) decrease the value of alpha from .05 to .01.
B) increase the sample size from 25 to 50.
C) use a non-parametric statistical test.
EPPP Statistics and Research Design

D) increase the value of beta from .01 to .05.

I’ll repeat the question (PAUSE).

To increase statistical power, a researcher would:


A) decrease the value of alpha from .05 to .01.
B) increase the sample size from 25 to 50.
C) use a non-parametric statistical test.
D) increase the value of beta from .01 to .05.

This question is asking how to increase statistical power, which refers to the ability to reject a false
hypothesis. Answer A says that a researcher will increase statistical power by decreasing alpha from .05
to .01. Recall that power is increased by increasing alpha. A larger alpha makes it easier to reject the null
hypothesis and, if the null hypothesis is actually false, power is increased. So answer A is not the correct
response because decreasing alpha from .05 to .01 would reduce power. Answer B says that power
would be increased by increasing the sample size from 25 to 50. A larger sample does increase power
because it reduces the effects of sampling error. If a sample includes one atypical person as the result of
sampling error, that person's score on the dependent variable will have a greater impact on the group
mean if the sample size is 25 than if it is 50, so answer B seems like the correct response. But a good test
taking strategy is to consider all of the answers even when you think you've already identified the
correct one. Answer C says that power will be increased by using a non-parametric test. As described in
the written study materials, non-parametric tests are used to analyze nominal and ordinal data while
parametric tests are used to analyze interval and ratio data. Non-parametric tests are used to analyze
less precise data, and they're less powerful than parametric tests. So, answer C is incorrect because
using a non-parametric test would not increase power. Finally, answer D says that power will be
increased by increasing beta from .01 to .05. If you take a look at the decision outcome table in the
written study materials, you'll see that beta is the probability of making a Type 2 error. This answer is
wrong for two reasons. First, increasing beta reduces power, and second, beta cannot be directly
controlled by a researcher. So the correct response is answer B, a researcher would increase power by
increasing the sample size from 25 to 50.

Now that we have identified what we want to study and the types of predictions we make, we need to
determine how to examine the data we collected. You should know that statistical methods are divided
into two types: descriptive and inferential. Descriptive statistics are used to describe a distribution of
data. Measures of central tendency such as mean, median, and mode; variability; and correlation
coefficients are descriptive statistics. Descriptive statistics give us a snapshot of what the data look like –
remember, descriptive statistics describe the data. Inferential statistics, on the other hand, allow us to
make inferences about the data. More specifically, inferential statistics help us determine whether the
data we gathered are statistically significant and generalizable beyond the research setting. Inferential
statistics are based on probability. They do not tell the researcher if a hypothesis is absolutely true or
false; rather, they allow the researcher to make conclusions based on probabilities. Let’s talk about
inferential statistics, or how to examine variables for difference.

First, you will need to know that inferential statistics can be divided into two types of tests: non-
parametric and parametric. Non-parametric tests are used with categorical data, also known as nominal
or ordinal, and when the assumptions for parametric tests have not been met. Non-parametric tests
include chi-square, Mann-Whitney, and the Wilcoxon matched-pairs test, and these tests can be
considered alternatives to their parametric counterparts. It is important to note that non-parametric
EPPP Statistics and Research Design

tests are less powerful. We will specifically discuss the chi-square test today. Parametric tests are used
when the data being analyzed are continuous, also known as interval or ratio data, and when certain
assumptions about the population have been met, such as a normal distribution and homoscedasticity.
Homoscedasticity refers to having the same scatter or variability in the data; in other words, when you
look at a scatterplot of data, the points have relatively similar scatter, meaning they aren’t widely varied.
Overall, parametric tests are more powerful than non-parametric tests.

There are three tests you want to be familiar with for the exam: the chi-square test, the t-test, and the
analysis of variance. To identify the appropriate tests for a particular study, the first thing you need to
determine is the scale of measurement of the data to be analyzed – remember NOIR. When the study
includes independent and dependent variables, this will be the scale of measurement of the dependent
variable. However, in some studies, a researcher is not investigating the effects of an independent
variable on a dependent variable, but simply wants to describe a sample in terms of one or more
variables. You have to determine the scale of measurement of the data being analyzed without
differentiating between independent and dependent variables. For example, a researcher might conduct
a study to determine if licensed psychologists in her state are more likely to have a Psy.D. or a Ph.D. In
this study, there is only one variable, which is type of degree, and that variable is nominal. So the
researcher would use a statistical test for nominal data. If you look at the statistical test table in the
written study materials, you'll see that the appropriate tests for nominal data is chi-square test.

There are two chi-square tests that you want to be familiar with: the single sample chi-square test and
the multiple sample chi-square test. To make it easier to remember the difference between them,
substitute the word variable for sample, so that you have a single variable and the multiple variable chi-
square test. The single variable chi- square test is used when the study includes only one variable, while
the multiple variable chi-square test is used when a study includes two or more variables. Keep in mind
that when deciding which of the two chi-square test you use, you do not distinguish between
independent and dependent variables but instead count the total number of variables in the study.

When a researcher wants to compare the number of psychologists in her state who have a Psy.D. vs.
Ph.D., the study includes a single nominal variable, so the single sample chi-square test would be the
appropriate statistical test. However, if the researcher expands her study to include gender, the study
would include two variables, gender and type of degree, and the data to be analyzed are the number of
people in each category—hat is, the number of females with a Psy.D., the number of males with a
Psy.D., and so on. Because the data to be analyzed are nominal and the study includes two variables, the
multiple sample chi-square test would be the appropriate statistical test. Let's take a look at a question
on the chi-square test. Here’s the question.

A researcher conducts a study to determine if clinical psychologists with Psy.D. versus Ph.D. degrees
differ in terms of theoretical orientation. She obtains a sample of 50 psychologists with a Psy.D. and 50
psychologists with a Ph.D. and asks them to identify their theoretical orientation as either
psychodynamic, humanistic, cognitive–behavioral, or other. To analyze the data she obtains, the
psychologist should use which of the following:
A) single sample chi-square test
B) multiple sample chi-square test
C) one-way ANOVA
D) factorial ANOVA

I’ll repeat the question (PAUSE).


EPPP Statistics and Research Design

A researcher conducts a study to determine if clinical psychologists with Psy.D. versus Ph.D. degrees
differ in terms of theoretical orientation. She obtains a sample of 50 psychologists with a Psy.D. and 50
psychologists with a Ph.D. and asks them to identify their theoretical orientation as either
psychodynamic, humanistic, cognitive–behavioral, or other. To analyze the data she obtains, the
psychologist should use which of the following:
A) single sample chi-square test
B) multiple sample chi-square test
C) one-way ANOVA
D) factorial ANOVA

To answer this question, you need to determine the scale of measurement of the data that will be
analyzed. You could conceptualize the study as having one independent and one dependent variable,
since the researcher wants to determine the effects of type of degree on theoretical orientation. So,
type of degree is the independent variable and theoretical orientation is the dependent variable. Both
the independent and dependent variables are measured on nominal scales, and the analysis will involve
comparing the number of participants in each nominal category. Because the data to be analyzed are
nominal and there was a total of two variables, answer B is the correct response, and the appropriate
statistical test is the multiple sample chi-square test.

Now let’s take a look at parametric tests, which are used to analyze continuous data, also known as
interval and ratio data. If you look at the statistical test table in the written study materials, you'll see
there are two types of tests used for interval and ratio data: the t-test and the analysis of variance, or
ANOVA.

Let's look first at the t-test, which is always used to compare two means. There are three versions of the
t- tests, and identifying the appropriate one depends on how the two means were obtained. The t-test
for a single sample is used to compare an obtained sample mean to a known population mean. For
example, if we evaluate the effects of a cognitive intervention on impulsivity by comparing the mean
impulsivity score of a sample of treated children to the known population mean for untreated children,
the t-test for a single sample would be the appropriate statistical test. In this study, the population is
serving as the no treatment control group, and we'd be comparing a sample or created group to a
known population mean.

The second type of t-test is the t-test for independent or unrelated samples, which is used to compare
the means of two unrelated groups. We might expand the cognitive intervention study by including two
groups of children with ADHD: one group that receives the cognitive intervention and one group that
does not. In this study, we’re still comparing two means, but this time they were obtained from two
independent groups rather than from a single group and the population. Consequently, the appropriate
test would be the test for independent samples.

Finally, the t-test for dependent or related samples is used to compare means from two related groups.
This would be the case if subjects were matched in pairs on an extraneous variable or other relevant
characteristic and the members of each matched pair were randomly assigned to one of the two groups.
For example, we'd have related groups in the cognitive intervention study if we start out with a sample
of 50 children with ADHD, matched the children in pairs on the basis of their symptom severity at the
beginning of the study, and then randomly assign one member of each pair to the intervention group
and the other to the no intervention group so that the two groups are equivalent in terms of initial
EPPP Statistics and Research Design

symptom severity. In this study, there's a relationship between the children and the two groups since
their assignment to groups involved matching them in terms of symptom severity. Consequently, the t-
tests for dependent samples would be the appropriate statistical test for comparing the two means
obtained in the study. Note that the t-tests for dependent samples is also the appropriate test when a
study includes a single group that will be compared to itself rather than to another group. In this type of
study, a single group of participants is administered a pre-test, then the treatment, and then a post-test.
This statistical test is used to compare the main pre- and post-test means. In this kind of study, the two
means are related because they came from the same group of participants, and the t-tests for
dependent samples would be the appropriate test for comparing the two means. Let's try a question on
the t-test. Here’s the question.

Thirty students are matched on the basis of their scores on a math aptitude test. One member of each
matched pair is randomly assigned to the experimental group, and the other is assigned to the control
group. Students in the experimental group attend a six-week mathematical problem-solving class. At the
end of the class, participants in both groups are given a math achievement test which is scored by
summing the number of items answered correctly. What is the appropriate statistical test for analyzing
the data collected in this study?
A) t-test for a single sample
B) t-test for multiple samples
C) t-test for dependent samples
D) t-test for independent samples

I’ll repeat the question (PAUSE).

Thirty students are matched on the basis of their scores on a math aptitude test. One member of each
matched pair is randomly assigned to the experimental group, and the other is assigned to the control
group. Students in the experimental group attend a six-week mathematical problem-solving class. At the
end of the class, participants in both groups are given a math achievement test which is scored by
summing the number of items answered correctly. What is the appropriate statistical test for analyzing
the data collected in this study?
A) t-test for a single sample
B) t-test for multiple samples
C) t-test for dependent samples
D) t-test for independent samples

The first step in answering this question is to identify the scale of measurement of the data that will be
analyzed. This study is being conducted to evaluate the effects of a problem solving class on math
achievement test scores. The independent variable is a problem solving class and the dependent
variable is math achievement test scores. So scores on the achievement test will be analyzed with the
statistical test. Because the scores are derived by adding the number of correct items, they are ratio
scores, and a parametric test is the appropriate type of statistical test. To determine which parametric
test to use, we have to look at the independent variable. There is one independent variable and it has
two levels: problem solving class and no problem solving class. So, we'd be using a statistical test to
compare means obtained from two groups, and the appropriate statistical test for this kind of
comparison is the t-test. Finally, since subjects were matched in terms of math aptitude before assigning
them to groups, there is a relationship between subjects in the two groups. So, answer C is the correct
response. The appropriate statistical test for comparing means from two related groups is the t-test for
dependent samples.
EPPP Statistics and Research Design

The last parametric test that we’ll cover is the analysis of variance or ANOVA. ANOVAs are use with
continuous data, or variables that are interval or ratio in nature. We’ll review four main types of ANOVA:
one-way ANOVA, factorial ANOVA, ANCOVA, and MANOVA. The one-way ANOVA is used when groups
are unrelated, and there is one independent variable and one dependent variable that is measured on
an interval or ratio scale. Although the one-way ANOVA can be used to compare two means, it's typically
used to compare three or more means. For example, a one-way ANOVA would be the appropriate test if
we expanded the treatment for impulsivity study by randomly assigning participants to one of three
treatment groups: a brief cognitive intervention group, a CNS stimulant drug group, or a combined
cognitive intervention plus drug group. To compare the means obtained by the three groups on a
measure of impulsivity, the one-way ANOVA would be the most appropriate test for comparing the
three means. Note that for this study, we could also use the t-test for independent samples instead of a
one-way ANOVA; however, with the t-test, we could compare only two means at a time, which means
that we'd have to conduct more than one t-test. The problem with this strategy is that, the more
statistical tests we conduct in a study, the greater the chance of making a Type I error.The advantage of
using a single one-way ANOVA instead of several t-tests is that a one-way ANOVA reduces the
probability of making a Type I error, which helps control the experiment-wise error rate. So, when a
study includes one independent variable with three or more levels, we usually conduct a one-way
ANOVA rather than several t-tests in order to control the experiment-wise error rate.

The second type of analysis of variance that you want to be familiar with is the factorial ANOVA. The
factorial ANOVA is used when a study includes two or more independent variables and one dependent
variable that is measured on an interval or ratio scale. If we expand the study on the three treatments
for impulsivity to include ADHD subtype as a second independent variable, we would use the factorial
ANOVA to analyze the effects of treatment and ADHD subtype on impulsivity scores. When a factorial
ANOVA is used to analyze data from a study that includes two independent variables, it's known as a
two-way ANOVA. When it's used to analyze data from a study that includes three independent variables,
it’s called a three-way ANOVA, and so on. When a one-way ANOVA is used to analyze the effects of one
independent variable, the results indicate if there are significant main effects of that variable. But when
a factorial ANOVA is used to analyze the effects of two or more independent variables, the results
indicate if there are significant main effects of each independent variable and if there are significant
interaction effects.

For the exam, you want to know the difference between main and interaction effects. A main effect
refers to the effects of one independent variable on the dependent variable. An independent variable
has a main effect when its different levels have different effects on the dependent variable. In contrast,
an interaction effect refers to the joint effects of two or more independent variables. There's an
interaction when the effects of one independent variable on the dependent variable differ for different
levels of another independent variable. In the treatment for impulsivity study, let's assume that type of
treatment and ADHD subtype are the independent variables and that we'll be comparing three
treatments (a brief cognitive intervention, a CNS stimulant, and a combination of the two) and two
ADHD subtypes (the predominantly hyperactive-impulsive type and the combined type). The result of a
two-way ANOVA will indicate whether or not there are significant main effects of treatment, significant
main effects of ADHD subtype, and a significant interaction between treatment and ADHD subtype. If
the results of the two-way ANOVA indicate a statistically significant interaction, this means that the
effects of treatment on impulsivity scores differ for different ADHD subtypes. More specifically, when
we look at the mean impulsivity scores obtained by each group, we might find that the cognitive
intervention alone and the combined treatment are equally effective and more effective than the CNS
EPPP Statistics and Research Design

stimulant alone for children with the predominantly hyperactive-impulsive type, but that the combined
treatment is more effective than either treatment alone for children with the combined type. In other
words, there's an interaction between type of treatment and ADHD subtype because the effects of the
three treatments on impulsivity differ for different subtypes.

Let’s look at a practice question. Here’s the question.

To determine if method of preparing for the EPPP affects scores on the exam, a researcher obtains a
sample of 120 individuals who will be taking the exam within ten days and have prepared for it using
independent study materials only, independent study materials plus a workshop, or neither
independent study materials nor a workshop. She has all of the individuals complete a 225-item practice
exam and determines how many questions each individual answered correctly. To analyze the data
obtained, the researchers should use which of the following:
A) one-way ANOVA
B) two-way ANOVA
C) an ANCOVA
D) MANOVA

I’ll repeat the question. (PAUSE).

To determine if method of preparing for the EPPP affects scores on the exam, a researcher obtains a
sample of 120 individuals who will be taking the exam within ten days and have prepared for it using
independent study materials only, independent study materials plus a workshop, or neither
independent study materials nor a workshop. She has all of the individuals complete a 225-item practice
exam and determines how many questions each individual answered correctly. To analyze the data
obtained, the researchers should use which of the following:
A) one-way ANOVA
B) two-way ANOVA
C) an ANCOVA
D) MANOVA

The first step in breaking down this question is to identify the scale of measurement of the data to be
analyzed. The researcher is investigating the effects of method of preparing for the licensing exam on
exam scores, so the independent variable is method of preparation and the dependent variable is score
on the practice exam. Since scores on the practice exam are number of correct items, they represent a
ratio scale of measurement, and the researcher will use a parametric test. To identify which parametric
test she should use, look at the independent variable. There is one independent variable, and it has
three levels: independent study materials only, independent study materials plus workshop, and neither
independent study materials nor workshop. In other words, the psychologist will be using a statistical
test to compare the mean exam scores obtained by three groups, so answer A is the correct response.
The appropriate statistical test for comparing three means with one independent variable is the one-
way ANOVA. Note that if the study included two independent variables instead, like study method and
type of degree, the correct response would have been answer B, the two-way ANOVA. We haven’t
discussed answer C and D yet, but if you are taking the exam and come across unfamiliar concepts, use
the strategies you know. Identify the IV and DV and their scales of measurement, the number of
independent variables, and the levels of the independent variables. Let’s move on to answers C and D.

The ANCOVA, which stands for the analysis of covariance, is useful when a researcher knows that the
EPPP Statistics and Research Design

groups are not initially equivalent with regard to an extraneous variable. It allows the researcher to
statistically remove the effects of the extraneous variable on the dependent variable, so it's easier to
detect the effects of the independent variable. For example, in the treatment for impulsivity study, we
might discover that, because we weren't able to randomly assign subjects to groups, most of the
subjects in one group began the study with more severe symptoms than subjects in the other groups. In
this situation, symptom severity is an extraneous variable and we could use the ANCOVA to statistically
remove its effects from the dependent variable. In essence, the ANCOVA would equalize all subjects
with regard to initial severity of symptoms so that it would be possible to conclude that any differences
between the groups at the end of the study are due to the effects of the treatments and not to initial
differences in symptoms severity.

The MANOVA stands for multivariate analysis of variance. The MANOVA is used when a study includes
one or more independent variables and two or more dependent variables on an interval or ratio scale.
The MANOVA allows a researcher to simultaneously analyze the effects of one or more independent
variables on all of the dependent variables. Similar to the advantage of using an ANOVA instead of
multiple t-tests, an advantage of using the MANOVA is that it helps reduce the experiment-wise error
rate. For instance, if we want to compare the effects of three treatment conditions on impulsivity scores
and attention span scores, we have two dependent variables. We could either conduct two one-way
ANOVAs, one for each dependent variable, or one MANOVA. Conducting a MANOVA would reduce the
number of statistical tests we conduct, which reduces the chance of making a Type I error.

The last topic in inferential statistics is multivariate techniques. These techniques are used to make
predictions and include multiple regression analysis, canonical correlation, discriminant function
analysis, and logistic regression. The main thing you want to know about these techniques for the exam
is when each one is used and the factors you consider when making a decision to use a specific model.
Before diving in, we need to clarify the meaning of a couple of terms. In experimental research, we
distinguish between independent and dependent variables, but in correlational research, we use the
terms predictor and criterion, with the predictor being equivalent to the independent variable and the
criterion being equivalent to the dependent variable. So, in describing the multivariate techniques used
for prediction, which all make use of correlation, we’ll be referring to predictors and criterion instead of
independent and dependent variables.

Let's start with multiple regression analysis which is an extension of regression analysis. Both of these
techniques are used when the goal is to predict or estimate scores on a single criterion being measured
on an interval or ratio scale. As an example, regression analysis is the appropriate technique when a
score on a measure of job knowledge will be used to predict a score on a measure of job performance.
In this situation, there is only one predictor, the measure of job knowledge, and one criterion, the
measure of job performance. However, if we use score on a measure of job knowledge and number of
years of previous experience to predict job performance score, we use multiple regression analysis,
which is the appropriate technique when the goal is to use two or more predictors to predict a score on
a single criterion. Regression analysis and multiple regression analysis both provide a correlation
coefficient and an equation that allows us to predict the criterion score from scores on the predictor or
predictors in regression analysis. The correlation coefficient is symbolized with the lower case r and
indicates the degree of association between the predictor and criterion. The correlation coefficient can
be squared to obtain a coefficient of determination, which is interpreted as a measure of shared
variability. As an example, if the correlation coefficient for job knowledge and job performance scores
is .60, this means that 36% of variability, or .60 squared, is shared by the two measures. In other words,
36% of variability in job performance is explained by or accounted for by variability in job knowledge.
EPPP Statistics and Research Design

Multiple regression analysis also provides a correlation coefficient, but it's referred to as the multiple
correlation coefficient and is symbolized with the capital letter R. The multiple correlation coefficient
can also be interpreted directly as a measure of the degree of association between the predictors and
the criterion, and it can be squared to obtain a coefficient of multiple determination, which is
interpreted as a measure of variability shared by all of the predictors and the criterion.

The next technique is canonical correlation. This statistic is used when two or more predictors will be
used to predict scores on two or more criteria that are each measured on an interval or ratio scale. A
canonical regression equation is used to make predictions about a person's scores on each criterion
based on their scores on the predictors. Canonical correlation is the appropriate technique if we want to
use scores on a measure of job knowledge and years of previous experience to predict scores on several
measures of job performance, such as number of units produced, number of errors made, and number
of days absent per year. Canonical correlation also provides a correlation coefficient that indicates the
degree of association between the predictors and the criterion.

The third multivariate technique for prediction is discriminant function analysis. It's used when the goal
is to use two or more predictors to predict or estimate status on a single criterion that is measured on a
nominal scale. In other words, discriminant function analysis is used to assign people to categories
(nominal scale) rather than to predict criterion scores. Discriminant function analysis can be used to
predict whether new graduate students are likely to graduate or drop out based on their undergraduate
GPA, GRE verbal score, and GRE quantitative score. In this example, the criterion is completion of
graduate school that is being measured in terms of two nominal categories: graduate vs. dropout.

Finally, the last multivariate technique for prediction is logistic regression. Logistic regression is similar
to discriminant function analysis but is less restrictive in terms of its assumptions. For example, the use
of discriminant function analysis is based on the assumptions that predictor scores are normally
distributed, linearly related, and have equal variances within groups. Use of logistic regression doesn't
require that these assumptions be met. So, if the data do not meet any of the assumptions for
discriminant function analysis, a researcher could use logistic regression instead. Let's look at a question
for multivariate techniques. Here’s the question.

A researcher wants to use age at onset of symptoms, WHODAS 2.0 score, and number of relatives who
have received a diagnosis of a psychotic disorder to predict whether patients with schizophrenia are at-
risk or not at-risk for relapse following their first hospitalization. The appropriate technique is:
A) canonical correlation.
B) multiple regression analysis.
C) discriminant function analysis.
D) analysis of covariance.

I’ll repeat the question (PAUSE).

A researcher wants to use age at onset of symptoms, WHODAS 2.0 score, and number of relatives who
have received a diagnosis of a psychotic disorder to predict whether patients with schizophrenia are at-
risk or not at-risk for relapse following their first hospitalization. The appropriate technique is:
A) canonical correlation.
B) multiple regression analysis.
C) discriminant function analysis.
D) analysis of covariance.
EPPP Statistics and Research Design

In the situation described, the researcher is using several predictors to predict status on a single
criterion. This indicates that we will use a multivariate technique for prediction. The predictors include
age of onset of symptoms, WHODAS 2.0 score, and number of relatives diagnosed with a psychotic
disorder. The criterion is risk of relapse following the first hospitalization, which is measured as at-risk or
not at-risk. Now, let’s look at the scale of measurement. The single criterion, which is status in terms of
relapse, is measured on a nominal scale. Patients will be categorized as either at-risk or not at-risk for
relapse following their first hospitalization. Now, let's look at the answers. Answer A is canonical
correlation, which is used when there are two or more predictors and two or more criteria that are each
measured on an interval or ratio scale. So, answer A isn't the correct response since this study includes
a single nominal criterion. Answer B is multiple regression analysis, which is a technique for using two or
more predictors to predict a score on a single criterion that is measured on an interval or ratio scale, so
answer B isn't the correct response either. Answer C is discriminant function analysis, which is used
when there are two or more predictors and a single nominal criterion, which is the case in this situation.
The researcher is using three predictors to categorize patients as either at-risk or not at-risk for relapse.
Answer C seems like the correct response, but let's look at answer D, which is the analysis of covariance.
The analysis of covariance is an inferential statistical test that's used to control the effects of an
extraneous variable, so we can eliminate this answer. That makes answer C the correct response.
Discriminant function analysis is the appropriate technique for using three predictors to categorize
patients as either at-risk or not at-risk for relapse.

This concludes the section on inferential statistics. Now that you have the basics of research design and
statistical methods, let’s shift gears slightly to review a few evaluation strategies and techniques
associated with program development. When developing programs, it is important to have an
understanding of what the need is or what gaps should be addressed with a new program or service.
This is achieved through a needs assessment. A needs assessment is a systematic process for
determining a gap or discrepancy (a need) between what is (current condition) and what should be
(desired condition). They are usually conducted during the initial stages of program implementation to
determine the needs and fine tune program development to meet the needs of the stakeholders. The
general steps for performing a needs assessment involve exploring and identifying of the current need
and what is desired, gathering and analyzing data to understand the gap between the current and
desired condition, identifying stakeholders or target groups and prioritize need, and designing a
program to address or minimize the gap in services or conditions. In addition to a needs assessment, a
cost-benefit analysis is conducted at the beginning of a program and compares the resources being used
in a program and their costs to the outcomes. It provides stakeholders a ratio that assesses the cost
relative to effects, or results, and it answers the question, “How much bang for your buck?”

Once a program is in the development and implementation phases, it must continue to be evaluated to
ensure the program is effective and stakeholders are getting their needs met. Common evaluation
strategies include process and implementation evaluation, formative and summative evaluations, and
outcome evaluations. Process or implementation evaluation determines whether program activities
have been implemented as intended and if the expected outcome was produced. A process evaluation
tells how and why a program achieved its goals and is implemented when the program begins.
Formative program evaluations are designed to help improve programs when they are formingand are
usually conducted when a new program is being developed, adapted, or modified. They allow for
modifications to be made before full program implementation. In contrast, summative program
evaluations focus on results, products, or impact and are conducted towards the end of a course,
program, or intervention. It provides evidence of the program’s effectiveness and helps to determine if a
EPPP Statistics and Research Design

program should be continued, altered, or replicated. Outcome evaluation is used to determine the
overall effectiveness of a program, i, e., if it was successful in achieving its goals and if it met its intended
outcomes. Outcome evaluation answers the question, “What effect did the program have on the
participants?”

Before we conclude, let’s briefly review a few considerations regarding community involvement,
participation in research dissemination, and presentation of research findings. This helps us close the
loop from research to practice. Dissemination refers to the communication of research findings to a
specific target audience of stakeholders, such as policy makers, research participants, or community
members, to help them make informed decisions that will lead them to desired outcomes. To be
collaborative and more transparent, dissemination should involve stakeholders and community
members in a two-way dialogue about new research findings. When research impacts a local
community, those community members and stakeholders deserve access to the knowledge they have
contributed to through participation or other forms of engagement, such as feedback. Effective
dissemination is important as it increases the awareness of the research, possibly furthering the
acceptance and impact of the findings that may benefit those in the community or who have
participated in the research. Dissemination of research findings to the community also offers a local
perspective on the implications of the research findings. When new research findings emerge at the
university level, it often takes many years before community-based programs become aware of and
adopt these findings into practice. When we disseminate research findings to community stakeholders,
it can be implemented immediately and locally, potentially reducing the large gap between research and
practice. Community dissemination also aids in developing culturally relevant interventions when we
engage in dialogue with those most affected by an issue the research was designed to address. Most
importantly, unless research findings are communicated in a timely manner to those who can directly
benefit from them, the impact will likely be diluted, regardless of how impactful the research results
were.

This concludes my lecture on Statistics and Research Design. Please note, this is not an exhaustive list of
the content within this domain. I encourage you to also study our written materials on Statistics and
Research Design as this will expand your understanding of the most important topics on the EPPP. Good
luck on your exam!

You might also like