Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

CORRELATION

Kristoffer Ryan T. Gidaya, PhD, RGC


THE STRUCTURE OF A
CORRELATIONAL DESIGN
• An alternative method, called the correlational method, is to treat each factor
like a dependent variable and measure the relationship between each pair of
variables.
Problem
• Suppose we observe students texting or not texting in class and compare
differences in class performance (as an exam grade out of 100 points).
In this example, the factor is texting (no,
yes), and the dependent variable is class
performance (an exam grade).
• An alternative method, called the correlational method, is to treat each factor
like a dependent variable and measure the relationship between each pair of
variables.
• we could measure texting during class (number of texts sent) and class performance (an
exam grade out of 100 points) for each student in a class.
• We could then test to see if there is a relationship between the pairs of scores for each
participant.
• In this example, illustrated in Figure 15.1b, if the scores are related, then we would
expect exam scores to decrease as the number of texts sent increases.
correlation
• is a statistical procedure used to describe the strength and direction of the
linear relationship between two factors.
• The statistics we use to measure correlations are called correlation coefficients
DESCRIBING A CORRELATION
• (1) describe the pattern of data points for the values of two factors and
• (2) determine whether the pattern observed in a sample is also present in the
population from which the sample was selected.
• The pattern of data points is described by the direction and strength of the
relationship between two factors.
• In behavioral research, we mostly describe the linear (or straight-line)
relationship between two factors.
scatter plot, also called a scatter gram
• is a graphical display of discrete data points (x, y) used to summarize the
relationship between two variables.
• A scatter plot is used to illustrate the relationship between two variables,
denoted (x, y).
• The x variable is plotted along the x-axis of a graph, and the y variable is
plotted along the y-axis.
Data points, also called bivariate plots
• are the x- and y-coordinates for each plot in a scatter plot.
• Pairs of values for x and y
• are plotted along the x- and y-axis of a graph to see if a pattern emerges.
• The pattern that emerges is described by the value and sign of a correlation.
The Direction of a Correlation
correlation coefficient (r)
• Is used to measure the strength and direction of the linear relationship, or
correlation, between two factors.
• The value of r ranges from -1.0 to +1.0.
• Values closer to ±1.0 indicate stronger correlations, meaning that a correlation
coefficient of r = -1.0 is as strong as a correlation coefficient of r = +1.0.
• The sign of the correlation coefficient (- or +) indicates only the direction or slope of
the correlation.
positive correlation
• (0 < r > +1 .0)
• is a positive value of r that indicates that the values of two factors change in
the same direction:
• As the values of one factor increase, the values of the second factor also increase; as the values of one
factor decrease, the values of the second factor also decrease
• a perfect positive correlation
• Occurs when each data point falls
exactly on a straight line, although this is
rare.
• More commonly, a positive
correlation is greater than 0 but less
than + 1.0, where the values of two
factors change in the same direction,
but not all data points fall exactly on a
straight line.
Negative correlation
• (-1 .0 < r < 0)
• is a negative value of r that indicates that the values of two factors change in
different directions, meaning that as the values of one factor increase, the
values of the second factor decrease.
• perfect negative correlation
• occurs when each data point falls exactly
on a straight line, although this is also
rare.
• More commonly, a negative correlation
is greater than -1.0 but less than 0,
where the values of two factors change
in the opposite direction, but not all
data points fall exactly on a straight
line.
The Strength of a Correlation
• zero correlation (r = 0) means that there is no linear pattern or relationship between
two factors.
• This outcome is rare because usually by mere chance, at least some values of one
factor, X, will show some pattern or relationship with values of a second factor, Y.
• The closer a correlation coefficient is to r = 0, the weaker the correlation and the
less likely that two factors are related; the closer a correlation coefficient is to r =
±1.0, the stronger the correlation and the more likely that two factors are related.
• The strength of a correlation reflects how consistently
scores for each factor change.
• When plotted in a scatter plot, scores are more consistent the
closer they fall to a regression line, or the straight line that
best fits a set of data points.
• The best-fitting straight line minimizes the total distance of all
data points that fall from it.
regression line
• is the best· fitting straight line to a set of data points. A best-fitting line is the
line that minimizes the distance of all data points that fall from it.
PEARSON CORRELATION
COEFFICIENT(r)
• also called the Pearson product-moment correlation coefficient
• is a measure of the direction and strength of the linear relationship of two
factors in which the data for both factors are measured on an interval or
ratio scale of measurement.
We can locate a sample of data points by
converting them to z scores and computing
the following formula:
Notice that the general formula for the Pearson correlation
coefficient is similar to that for computing variance.

• This formula has the drawback of requiring that each score be transformed
into a z score, and consequently, an equivalent formula that uses raw scores is
used more often to calculate the Pearson correlation coefficient.
• Writing this formula in terms of the sum of the squares gives us the
following correlation coefficient:
• To illustrate how we can use this formula to measure the distance that points
fall from the regression line, consider the example, plotted in Figure 15 .6, in
which we seek to determine the relationship between the number of months
that students attended college and the number of classes they missed.
• Notice that some data points fall on the line and others fall some distance
from the line. The correlation coefficient measures the variance in the
distance that data points fall from the regression line.
• The value in the numerator of the Pearson correlation coefficient reflects the
extent to which values on the x-axis (X) andy-axis ( Y) vary together.
• The extent to which the values of two factors vary together is called
covariance.
• The extent to which values of X and Y vary independently, or separately, is
placed in the denominator.
The formula for r can be stated as follows:
• The correlation coefficient, r, measures the variance of X and the variance of
Y, which constitutes the total variance that can be measured.
• The total variance is placed in the denominator of the formula for r.
• The covariance in the numerator is the amount or proportion of the total
variance that is shared by X and Y.
• The larger the covariance, the closer data points will fall to the regression line.
• When all data points for X and Y fall exactly on a regression line, the
covariance equals the total variance, making the formula for r equal to +1.0
or -1.0, depending on the direction of the relationship.
• The farther that data points fall from the regression line, the smaller the
covariance will be compared to the total variance in the denominator,
resulting in a value of r closer to 0.
We compute the Pearson correlation coefficient
for data measured on an interval or ratio scale
of measurement, following these steps:
• Step 1: Compute preliminary calculations.
• Step 2: Compute the Pearson correlation coefficient (r).
Sample Problem
• An area of research of particular interest is the relationship between mood and
appetite (Hammen & Keenan-Miller, 2013; Privitera, Antonelli, & Creary, 2013;
Wagner, Boswell, Kelley, & Heatherton, 2012). As an example of one such study
from this area of research, suppose a health psychologist tests if mood and eating
are related by recording data for each variable in a sample of 8 participants. She
measures mood using a 9-point rating scale in which higher ratings indicate better
mood. She measures eating as the average number of daily calories that each
participant consumed in the previous week. The results for this study are listed in
Figure 1 5.8.
• We will compute the Pearson correlation coefficient using these data.
Step 1 : Compute preliminary calculations.
• We begin by making the preliminary calculations. The signs (+ and -) of the
values we measure for each factor are essential to making accurate
computations. The goal is to find the sum of squares needed to complete the
formula for r.
Effect Size: The Coefficient of Determination

• A correlation coefficient ranges from -1 to +1, so it can be negative. To


compute proportion of variance as an estimate of effect size, we square the
correlation coefficient r.
coefficient of determination
• In our example, we want to measure the proportion of variance in calories
consumed (eating) that can be explained by ratings of mood. The coefficient of
determination for the data is

• In terms of proportion of variance, we conclude that about 55% of the variability


in calories consumed can be explained by participants' ratings of their mood.
Hypothesis Testing: Testing for
Significance
• We can also follow the steps to hypothesis testing to test for significance.
• By doing so, we can determine whether the correlation observed in a sample
is present in the population from which the sample was selected.
Step 1: State the hypotheses.
• To test for the significance of a correlation, the null hypothesis is that there
is no relationship between two factors (a zero correlation) in the population.
• The alternative hypothesis is that there is a relationship between two factors
(a positive or negative correlation) in the population. For a population, the
correlation coefficient is symbolized by the Greek letter rho, p.
• We can therefore state the hypotheses for our example as follows:
Step 2: Set the criteria for a decision.
• We will compute a two-tailed test at a .05 level of significance. The degrees
of freedom are the number of scores that are free to vary for X and for Y.
• All X scores except one are free to vary, and all Y scores except one are free
to vary. Hence, the degrees of freedom for a correlation are n- 2.
• In our example, n = 8; therefore, the degrees of freedom for this test are 8 -
2 = 6.
• To locate the critical values for this test, look in Table B.5 in Appendix B.
• The alpha levels for one-tailed and two-tailed tests are given in each column
and the degrees of freedom in the rows in Table B.5 in Appendix B.
• At a .05 level of significance, the critical values for this test are ±.707. The
probability is less than 5% that we will obtain a correlation stronger than r =
±.707 when n = 8. If r is stronger than or exceeds ±.707, then we reject the
null hypothesis; otherwise, we retain the null hypothesis.
Step 3: Compute the test statistic.
• The correlation coefficient r is the test statistic for the hypothesis test. We
already measured this: r = -.744.
Step 4: Make a decision.
• To decide whether to retain or reject the null hypothesis, we compare the
value of the test statistic to the critical values.
• Because r = -.744 exceeds the lower critical value, we reject the null
hypothesis. We conclude that the correlation observed between mood and
eating reflects a relationship between mood and eating in the population. If
we were to report this result in a research journal, it would look something
like this:
• Using the Pearson correlation coefficient, a significant relationship between mood and
eating was evident, r = -.744, p < .05.

You might also like