Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

AN OVERVIEW OF CORRELATION

Suppose a safety inspector wants to determine whether a relationship exists between the number of
hours of training for an employee and the number of accidents involving that employee. Or suppose a
psychologist wants to know whether a relationship exists between the number of hours a person sleeps
each night and that person’s reaction time. How would he or she determine if any relationship exists?
In this section, you will study how to describe what type of relationship, or correlation, exists between
two quantitative variables and how to determine whether the correlation is significant.

DEFINITION
A correlation is a relationship between two variables. The data can be
represented by the ordered pairs 1x, y2, where x is the independent (or
explanatory) variable and y is the dependent (or response) variable.

The graph of ordered pairs 1x, y2 is called a scatter plot. In a scatter plot,
the ordered pairs 1x, y2 are graphed as points in a coordinate plane. The
independent (explanatory) variable x is measured on the horizontal axis, and
the dependent (response) variable y is measured on the vertical axis. A scatter
plot can be used to determine whether a linear (straight line) correlation
exists between two variables. The scatter plots below show several types of
correlation.

y
y As x increases,
y tends to
decrease.
As x increases,
y tends to
increase.
x
x

Negative Linear Correlation Positive Linear Correlation

y y

x x

No Correlation Nonlinear Correlation


GDP CO2 emissions Constructing a Scatter Plot
(in trillions of (in millions of An economist wants to determine whether there is a linear relationship
dollars), x metric tons), y between a country’s gross domestic product (GDP) and carbon dioxide
1.7 552.6 (CO2) emissions. The data are shown in the table at the left. Display the data
1.2 462.3 in a scatter plot and describe the type of correlation. (Source: World Bank and
U.S. Energy Information Administration)
2.5 475.4
2.8 374.3 Solution The scatter plot is y

(in millions of metric tons)


3.6 748.5 shown at the right. From the 1200
scatter plot, it appears that there

CO2 emissions
2.2 400.9 1000
is a positive linear correlation 800
0.8 253.0
between the variables. 600
1.5 318.6
Interpretation Reading from 400
2.4 496.8 200
left to right, as the gross domestic
5.9 1180.6 x
products increase, the carbon 1 2 3 4 5 6
dioxide emissions tend to increase. GDP (in trillions of dollars)

Constructing a Scatter Plot


A student conducts a study to determine whether there is a linear relationship
between the number of hours a student exercises each week and the student’s
grade point average (GPA). The data are shown in the table below. Display
the data in a scatter plot and describe the type of correlation.

Hours of exercise, x 12 3 0 6 10 2 18 14 15 5

GPA, y 3.6 4.0 3.9 2.5 2.4 2.2 3.7 3.0 1.8 3.1

Solution The scatter plot is shown at the left. From the scatter plot, it
appears that there is no linear correlation between the variables.
Interpretation The number of hours a student exercises each week does not
appear to be related to the student’s grade point average.
y

4.0
Grade point average

3.5
3.0
2.5
2.0
1.5
1.0
0.5
x
2 4 6 8 10 12 14 16 18
Hours of exercise
Duration, Time, Duration, Time, Constructing a Scatter Plot Using Technology
x y x y Old Faithful, located in Yellowstone National Park, is the world’s most famous
1.80 56 3.78 79 geyser. The durations (in minutes) of several of Old Faithful’s eruptions
1.82 58 3.83 85
and the times (in minutes) until the next eruption are shown in the table at
the left. Use technology to display the data in a scatter plot. Describe the type
1.90 62 3.88 80
of correlation.
1.93 56 4.10 89
1.98 57 4.27 90
2.05 57 4.30 89
2.13 60 4.43 89
Solution
2.30 57 4.47 86
MINITAB, Excel, and the TI-84 Plus each have features for graphing scatter
2.37 61 4.53 89
plots. Try using this technology to draw the scatter plots shown. From the
2.82 73 4.55 86 scatter plots, it appears that the variables have a positive linear correlation.
3.13 76 4.60 92
3.27 77 4.63 91
3.65 77

MINITAB EXCEL T I - 8 4 PLUS


Time (in minutes)
Time (in minutes)

100 100
90 90
80 80
70 70
60 60
50 50
40 40
1 2 3 4 5 1 2 3 4 5
Duration (in minutes) Duration (in minutes)

Interpretation Reading from left to right, as the durations of the eruptions


increase, the times until the next eruption tend to increase.
CORRELATION COEFFICIENT
Interpreting correlation using a scatter plot can be subjective. A more precise
way to measure the type and strength of a linear correlation between two
variables is to calculate the correlation coefficient. Although a formula for the
sample correlation coefficient is given, it is more convenient to use technology
to calculate this value.

DEFINITION
The correlation coefficient is a measure of the strength and the direction of a
linear relationship between two variables. The symbol r represents the sample
correlation coefficient. A formula for r is
nΣxy - 1 Σx21Σy2
r = Sample correlation coefficient
2nΣx - 1Σx2 2 2nΣy2 - 1Σy2 2
2

where n is the number of pairs of data.


The population correlation coefficient is represented by r (the lowercase
Greek letter rho, pronounced “row”).

The range of the correlation coefficient is -1 to 1, inclusive. When x and y


have a strong positive linear correlation, r is close to 1. When x and y have a
strong negative linear correlation, r is close to -1. When x and y have perfect
positive linear correlation or perfect negative linear correlation, r is equal to 1
or -1, respectively. When there is no linear correlation, r is close to 0. It is
important to remember that when r is close to 0, it does not mean that there is no
relation between x and y, just that there is no linear relation. Several examples
are shown below.

y
y
y

Amount spent on milk


160

per year (in dollars)


Total cost (in dollars)

60
13 140
50 12 120
Shoe size

40 11 100
30 10 80
20 9 60
10 8 40
7 x
x
1 2 3 4 5 6 7 8 x 10 20 3040 50 6070
60 62 64 66 68 70 72 Income per year
Number of adult
movie tickets Height (in inches) (in thousands of dollars)

Perfect positive correlation Strong positive correlation Weak positive correlation


r = 1 r = 0.81 r = 0.45

y y y

100 72
Height (in inches)

100
70
Exam score

90 90
Test grade

68
80 80
66
70 70
64
60 60 62
50 50 60
x x x
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 98 102 106
Number incorrect Number of absences IQ score

Perfect negative correlation Strong negative correlation No correlation


r = -1 r = - 0.92 r = 0.04
GUIDELINES
Calculating a Correlation Coefficient
IN WORDS IN SYMBOLS
1. Find the sum of the x@values. Σx
2. Find the sum of the y@values. Σy
3. Multiply each x@value by its corresponding Σxy
y@value and find the sum.
4. Square each x@value and find the sum. Σx2
5. Square each y@value and find the sum. Σy2
nΣxy - 1 Σx2 1 Σy2
6. Use these five sums to calcu- r =
late the correlation coefficient. 2nΣx - 1 Σx2 2 2nΣy2 - 1 Σy2 2
2

Calculating a Correlation Coefficient


Calculate the correlation coefficient for the gross domestic products and carbon
dioxide emissions data in Example 1. Interpret the result in the context of the data.

Solution Use a table to help calculate the correlation coefficient.

GDP CO2 emissions


(in trillions of (in millions of
dollars), x metric tons), y xy x2 y2
1.7 552.6 939.42 2.89 305,366.76
1.2 462.3 554.76 1.44 213,721.29
2.5 475.4 1188.5 6.25 226,005.16
2.8 374.3 1048.04 7.84 140,100.49
3.6 748.5 2694.6 12.96 560,252.25
2.2 400.9 881.98 4.84 160,720.81
0.8 253.0 202.4 0.64 64,009
1.5 318.6 477.9 2.25 101,505.96
2.4 496.8 1192.32 5.76 246,810.24
5.9 1180.6 6965.54 34.81 1,393,816.36
Σx = 24.6 Σy = 5263 Σxy = 16,145.46 Σx = 79.68
2
Σy = 3,412,308.32
2

With these sums and n = 10, the correlation coefficient is


nΣxy - 1 Σx2 1 Σy2
r =
2nΣx - 1Σx2 2 2nΣy2 - 1Σy2 2
2

10116,145.462 - 124.62 152632


=
210179.682 - 124.62 2 21013,412,308.322 - 152632 2
31,984.8
=
2191.64 26,423,914.2
≈ 0.912.
The result r ≈ 0.912 suggests a strong positive linear correlation.
Interpretation As the gross domestic product increases, the carbon dioxide
emissions tend to increase.

You might also like