Professional Documents
Culture Documents
Linear Regression
Linear Regression
Using a scatter diagram, we can determine if the two variables are linearly related
to some extent. Once a reasonable linear relationships has been ascertained, we usually
express this mathematically by a straight-line equation called the linear regression line.
The linear regression line is written using the slope-intercept form
where the constants a and b represents the y-intercept and slope, respectively. The symbol
is used here to distinguish between the value given by the regression line and an actual
observed value y for some value of x.
Once the point estimates a and b are determined from the sample data, the linear
regression line can be used to predict the value corresponding to any given value x.
and
1
Solution:
1 6 6 1 36
2 4 8 4 16
3 3 9 9 9
4 5 20 16 25
5 4 20 25 16
6 2 12 36 4
Total 21 24 75 91 106
(a)
( c)
LINEAR CORRELATION
We shall consider here the problem of measuring the relationship between two
variables X and Y rather than predicting a value of Y from a knowledge of the independent
2
variable X. For example, if X represents the amount of money spent yearly on advertising
by a retail merchandising firm and Y represents their total yearly sales, we might ask
whether a decrease in advertising is likely to be accompanied by a decrease in the yearly
sales.
Y Y
. .
.. . .
... . . . .
.... . . .
... . . .
... . .
X X
(a) (b)
Y
Y
. . ...
. . . .. ...
. . . ... ...
. . . ... ...
. . . ..
. . . .... ...
X X
(c ) (d)
The correlation coefficient between two variables is a measure of their linear
relationship and a value of implies a lack of linearity and not a lack of association.
Hence, if a strong quadratic relationship exists between X and Y as indicated in (d), we still
obtain a zero correlation even though there is a strong nonlinear relationship.
3
The most widely used measure of linear correlation between two variables is called
PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT or simply the
SAMPLE CORRELATION COEFFICIENT and is denoted by r.
Since
And by dividing both sides of the equation by we obtain the relation
Note that SSE and are always nonnegative, we can say that must be between zero
and 1. Consequently r must range from –1 to +1. A value of r = -1 will occur when SSE
= 0 and all points lie exactly on a straight line having a negative slope. If all points lie
exactly on a straight line having a positive slope, once again SSE =0 and we obtain a value
r= +1. Hence a perfect linear relationship exists between the values of X and Y in our
sample when If r is close to +1 or –1, the linear relationship between the two
variables is strong and we say that we have a high correlation. However, if r is close to
zero, the linear relationship between X and Y is weak or perhaps nonexistent.
A number that expresses the proportion of the total variation in the values of the
variable Y that can be accounted for or explained by the linear relationship with the values
of the variable X is usually referred to as the sample coefficient of variation and is denoted
by . Thus a correlation of r= 0.6 means that 0.36 or 36% of the total variation of the
values of Y in our sample is accounted for by linear relationship with the values of X.
R interpretation
1 Perfect positive correlation
4
0.91 to 0.99 very highly positively correlated
0.71 to 0.90 highly positively correlated
0.41 to 0.70 Marked or moderately positively correlated
0.21 to 0.40 Low or slightly positively correlated
0 to 0.21 negligible
-0.20 to 0 negligible
-0.21 to -0.40 Low or slightly negatively correlated
-0.41 to 0.70 Marked or moderately negatively correlated
-0.71 to -0.90 Highly positively correlated
-0.91 to -0.99 Very highly positively correlated
-1 Perfect negative correlation
Example 1: Compute and interpret the correlation coefficient for the following data:
X 4 5 9 14 18 22 24
Y 16 22 11 16 7 3 17
Solution:
4 16 16 256 64
5 22 25 484 110
9 11 81 121 99
14 16 196 256 224
18 7 324 49 126
22 3 484 9 66
24 17 576 289 408
Total 96 92 1702 1464 1097
5
Since r= -0.53, the two variables X and Y are moderately negatively correlated.
Example 2. Compute and interpret the correlation coefficient for the aptitude scores and
grade point averages below:
Solution:
6
GPA AS
7
The grade-point averages are highly correlated with the aptitude scores.
1. The grades of a class of 9 students on a midterm report (x) and on the final
examination (y) are as follows:
x 77 50 71 72 81 94 96 99 67
y 82 66 78 34 47 85 99 99 67
8
3. A mathematics placement test is given to all entering freshmen at a small
college. A student who receives a grade below 35 is denied admission to the
regular mathematics course and placed in a remedial class. The placement test
scores and the final grades for 20 students who took the regular course were
recorded as follows:
4. Compute and interpret the correlation for the following grades of 6 students
selected at random.
Mathematics Grade 70 92 80 74 65 83
English Grade 74 84 63 87 78 90
5. The following data were obtained in a study of the relationship between the
weight and chest size of infants at birth:
Weight (kg) Chest Size (cm) Weight (kg) Chest Size (cm)
2.75 29.5 4.32 27.7
2.15 26.3 2.31 28.3
4.41 32.2 4.30 30.3
5.52 36.5 3.71 28.7
3.21 27.2
(a) Calculate r.
(b) Graph the line on a scatter diagram.
(c) Find the point estimate of .