Professional Documents
Culture Documents
Correlation Coefficient
Correlation Coefficient
Suppose a safety inspector wants to determine whether a relationship exists between the number of
hours of training for an employee and the number of accidents involving that employee. Or suppose a
psychologist wants to know whether a relationship exists between the number of hours a person sleeps
each night and that person’s reaction time. How would he or she determine if any relationship exists?
In this section, you will study how to describe what type of relationship, or correlation, exists between
two quantitative variables and how to determine whether the correlation is significant.
DEFINITION
A correlation is a relationship between two variables. The data can be
represented by the ordered pairs 1x, y2, where x is the independent (or
explanatory) variable and y is the dependent (or response) variable.
The graph of ordered pairs 1x, y2 is called a scatter plot. In a scatter plot,
the ordered pairs 1x, y2 are graphed as points in a coordinate plane. The
independent (explanatory) variable x is measured on the horizontal axis, and
the dependent (response) variable y is measured on the vertical axis. A scatter
plot can be used to determine whether a linear (straight line) correlation
exists between two variables. The scatter plots below show several types of
correlation.
y
y As x increases,
y tends to
decrease.
As x increases,
y tends to
increase.
x
x
y y
x x
CO2 emissions
2.2 400.9 1000
is a positive linear correlation 800
0.8 253.0
between the variables. 600
1.5 318.6
Interpretation Reading from 400
2.4 496.8 200
left to right, as the gross domestic
5.9 1180.6 x
products increase, the carbon 1 2 3 4 5 6
dioxide emissions tend to increase. GDP (in trillions of dollars)
Hours of exercise, x 12 3 0 6 10 2 18 14 15 5
GPA, y 3.6 4.0 3.9 2.5 2.4 2.2 3.7 3.0 1.8 3.1
Solution The scatter plot is shown at the left. From the scatter plot, it
appears that there is no linear correlation between the variables.
Interpretation The number of hours a student exercises each week does not
appear to be related to the student’s grade point average.
y
4.0
Grade point average
3.5
3.0
2.5
2.0
1.5
1.0
0.5
x
2 4 6 8 10 12 14 16 18
Hours of exercise
Duration, Time, Duration, Time, Constructing a Scatter Plot Using Technology
x y x y Old Faithful, located in Yellowstone National Park, is the world’s most famous
1.80 56 3.78 79 geyser. The durations (in minutes) of several of Old Faithful’s eruptions
1.82 58 3.83 85
and the times (in minutes) until the next eruption are shown in the table at
the left. Use technology to display the data in a scatter plot. Describe the type
1.90 62 3.88 80
of correlation.
1.93 56 4.10 89
1.98 57 4.27 90
2.05 57 4.30 89
2.13 60 4.43 89
Solution
2.30 57 4.47 86
MINITAB, Excel, and the TI-84 Plus each have features for graphing scatter
2.37 61 4.53 89
plots. Try using this technology to draw the scatter plots shown. From the
2.82 73 4.55 86 scatter plots, it appears that the variables have a positive linear correlation.
3.13 76 4.60 92
3.27 77 4.63 91
3.65 77
100 100
90 90
80 80
70 70
60 60
50 50
40 40
1 2 3 4 5 1 2 3 4 5
Duration (in minutes) Duration (in minutes)
DEFINITION
The correlation coefficient is a measure of the strength and the direction of a
linear relationship between two variables. The symbol r represents the sample
correlation coefficient. A formula for r is
nΣxy - 1 Σx21Σy2
r = Sample correlation coefficient
2nΣx - 1Σx2 2 2nΣy2 - 1Σy2 2
2
y
y
y
60
13 140
50 12 120
Shoe size
40 11 100
30 10 80
20 9 60
10 8 40
7 x
x
1 2 3 4 5 6 7 8 x 10 20 3040 50 6070
60 62 64 66 68 70 72 Income per year
Number of adult
movie tickets Height (in inches) (in thousands of dollars)
y y y
100 72
Height (in inches)
100
70
Exam score
90 90
Test grade
68
80 80
66
70 70
64
60 60 62
50 50 60
x x x
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 98 102 106
Number incorrect Number of absences IQ score