Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

part(7)

From page (66) to page (82)

Second Year - Descriptive Statistics -


2021-2022 - Dr. Moustafa Salem
56
Linear Regression and Correlation

Correlation Analysis
 Correlation Analysis is the study of the relationship between
variables. It is also defined as group of techniques to measure
the association between two variables.
 A Scatter Diagram is a chart that portrays the relationship
between the two variables. It is the usual first step in
correlations analysis
– The Dependent Variable is the variable being predicted or
estimated.
– The Independent Variable provides the basis for estimation.
It is the predictor variable.

Example (1)
The sales manager of Copier Sales of
America, which has a large sales force throughout the United
States and Canada, wants to determine whether there is a
relationship between the number of sales calls made in a month
and the number of copiers sold that month. The manager
selects a random sample of 10 representatives and determines
the number of sales calls each representative made last month
and the number of copiers sold.

55
Describe correlation between the two variables

Scatter Diagram

The Coefficient of Correlation (r)


The Coefficient of Correlation (r) is a measure of the
strength of the relationship between two variables. It
requires interval or ratio-scaled data.
 It can range from
 Values of indicate perfect and strong
correlation.
 Values close to indicate weak correlation.
 Negative values indicate an inverse relationship and
positive values indicate a direct relationship.

Correlation Coefficient - Interpretation

56
Perfect Correlation

57
Person Correlation Coefficient
( ) ( )
Sample Population
∑( ̅ )( ̅)
proudect moment formula √∑( ̅ ) √∑( ̅)
∑ ∑

The short computation formula
(∑ ) (∑ )
√∑ √∑

Coefficient of Determination ( 𝑟 2 )
The coefficient of determination (r2) is the proportion
of the total variation in the dependent variable (Y) that
is explained or accounted for by the variation in the
independent variable (X). It is the square of the
coefficient of correlation.
 It ranges from .
 It does not give any information on the direction
of the relationship between the variables.

Example (2)
The sales manager of Copier Sales of
America, which has a large sales force throughout the
United States and Canada, wants to determine whether
there is a relationship between the number of sales
calls made in a month and the number of copiers sold
that month. The manager selects a random sample of
10 representatives and determines the number of sales
calls each representative made last month and the
number of copiers sold, Compute the correlation
coefficient and coefficient of determination.
58
Use x or y there is no difference because you just get correlation

67
(∑( ̅) ) (∑( ̅) )
We use the formula √ and √

How do we interpret a correlation of 0.759?


First, it is positive, so we see there is a direct relationship
between the number of sales calls and the number of copiers
sold. The value of 0.759 is fairly close to 1.00, so we conclude
that the correlation is strong.

However,
does this mean that more sales calls cause more sales?
No, we have not demonstrated cause and affect here, only that
the two variables—sales calls and copiers sold—are related.
The coefficient of determination, r2, is 0.576, found by
(0.759)2

This is a proportion or a percent; we can say that 57.6


percent of the variation in the number of copiers sold is
explained, or accounted for, by the variation in the number
of sales calls.

Example (3)
Consider the following data on the respective masses X and Y
of a sample of 12 fathers and their oldest sons.
(a) Compute the covariance.
(b) Find the correlation coefficient using the product
moment formula.
(c) Find the correlation coefficient using the short
computational formula.
X 65 63 67 64 68 62 70 66 68 67 69 71
Y 68 66 68 65 69 66 68 65 71 67 68 70
67
2 2
( ̅) ( ̅) ( ̅ )2 ( ̅ )( ̅) ( ̅ )2
65 68 4225 4420 4624 -1.7 0.4 2.89 -0.68 0.16
63 66 3969 4158 4356 -3.7 -1.6 13.69 5.92 2.56
67 68 4489 4556 4624 0.3 0.4 0.09 0.12 0.16
64 65 4096 4160 4225 -2.7 -2.6 7.29 7.02 6.76
68 69 4624 4692 4761 1.3 1.4 1.69 1.82 1.96
62 66 3844 4092 4356 -4.7 -1.6 22.09 7.52 2.56
70 68 4900 4760 4624 3.3 0.4 10.89 1.32 0.16
66 65 4356 4290 4225 -0.7 -2.6 0.49 1.82 6.76
68 71 4624 4828 5041 1.3 3.4 1.69 4.42 11.56
67 67 4489 4489 4489 0.3 -0.6 0.09 -0.18 0.36
69 68 4761 4692 4624 2.3 0.4 5.29 0.92 0.16
71 70 5041 4970 4900 4.3 2.4 18.49 10.32 5.76
800 811 53418 54107 54849 0 0 84.68 40.34 38.92

̅

̅
(a) Covariance
∑( ̅)( ̅)

(b) The correlation coefficient using the product moment


formula:
∑( ̅)( ̅)
√∑( ̅ )2 √∑( ̅)2

√ √
This shows that there is a strong linear correlation between the
variables.
(c) The correlation coefficient using the short computational
formula.

67
∑ ∑

√∑ (∑ )2 √ (∑ )2
2 ∑ 2

( )( )
( )

√( ( )2 √ ( )2
) ( )
This shows that there is a strong linear correlation between the
variables.

Regression Analysis
In regression analysis we use the independent
variable (X) to estimate the dependent variable (Y).
 The relationship between the variables is linear.
 Both variables must be at least interval scale.
 The least squares criterion is used to determine the
equation.

67
Linear Regression Model

Computing the Slope of the Line

∑ ∑
( ) ∑( ̅)( ̅) ∑
( ) ∑( ̅) (∑ )

Computing the Y-Intercept

67
Least Squares Principle

 The least squares principle is used to obtain a and b.


 The equations to determine a and b are:

Example (4)
Recall the example involving Copier Sales of
America. The sales manager gathered information on the
number of sales calls made and the number of copiers sold
for a random sample of 10 sales representatives. Use the
least squares method to determine a linear equation to
express the relationship between the two variables.
What is the expected number of copiers sold by a
representative who made 20 calls?

66
Answer

Step 3- Find the regression equation


The regression equation is :
^
Y  a  bX
^
Y  18.9476  1.1842 X
^
Y  18.9476  1.1842(20)
^
Y  42.6316

65
Example (5)

Computing the Estimates of Y , using the regression


equation ( ̂ ), substitute the value
of each X to solve for the estimated sales

̂
20 30 42.6316
40 60 66.3156
20 40 42.6316
30 60 54.4736
10 30 30.7896
10 40 30.7896
20 40 42.6316
20 50 42.6316
20 30 42.6316
30 70 54.4736

The Standard Error of Estimate

 The standard error of estimate measures the


scatter, or dispersion, of the observed values around
the line of regression
 The formulas that are used to compute the standard
error:

^
(Y  Y ) 2 Y 2  aY  bXY
s y. x 
s y. x  n2
n2

66
Example (6)
Recall the example involving Copier Sales of
America. The sales manager determined the least
squares regression equation is given below.
Determine the standard error of estimate as a
measure of how well the values fit the regression line.

67
^
(Y  Y ) 2
s y. x 
n2 ^
Y  18.9476  1.1842 X
784.211
  9.901
10  2

Example (7)

Consider the following data on the respective masses X and Y


of a sample of 12 fathers and their oldest sons.
X 65 63 67 64 68 62 70 66 68 67 69 71
Y 68 66 68 65 69 66 68 65 71 67 68 70
(a) Use the least squares method to determine a linear
equation to express the relationship between the
two variables.
(b) Compute the Estimates of Y
(c) Determine the standard error of estimate as a
measure of how well the values fit the
regression line.

68
2 2 ̂
65 68 4225 4420 4624 66.786
63 66 3969 4158 4356 65.8332
67 68 4489 4556 4624 67.7388
64 65 4096 4160 4225 66.3096
(a) ̂
68 69 4624 4692 4761 68.2152
62 66 3844 4092 4356 65.3568
70 68 4900 4760 4624 69.168
66 65 4356 4290 4225 67.2624
68 71 4624 4828 5041 68.2152
67 67 4489 4489 4489 67.7388
69 68 4761 4692 4624 68.6916
71 70 5041 4970 4900 69.6444
800 811 53418 54107 54849
∑ ∑ ∑ ∑ ( )( )
∑ (∑ ) ( ) ( )

̅ ̅

(b) ̂
∑ ∑ ∑
(c) √
2
( 2)( ) ( )( )

2 2

H.W
1) Consider the following data
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 (a)
9
(b)
(a) Compute the covariance.
(b) Find the correlation coefficient using the product moment formula.
(c) Find the correlation coefficient using the short computational
formula.

77
2) Consider the following data
X 18 12 20 15 10
Y 17 13 19 12 9
(a) Compute the covariance.
(b) Find the correlation coefficient using the product moment formula.
(c) Find the correlation coefficient using the short computational
formula.

H.W (1)

2 2 ( ̅) ( ̅) ( ̅ )2 ( ̅ )( ̅) ( ̅)2
1 1 1 1 1 -6 -4 36 24 16
3 2 9 6 4 -4 -3 16 12 9
4 4 16 16 16 -3 -1 9 3 1
6 4 36 24 16 -1 -1 1 1 1
8 5 64 40 25 1 0 1 0 0
9 7 81 63 49 2 2 4 4 4
11 8 121 88 64 4 3 16 12 9
14 9 196 126 81 7 4 49 28 16
56 40 524 364 256 0 0 132 84 56

̅

̅
∑( ̅)( ̅)
(a) Covariance
(b) The correlation coefficient using the product moment formula:
∑( ̅ )( ̅ )
√∑( ̅ )2 √∑( ̅ )2 √ √
This shows that there is a very high (strong) linear correlation
between the variables.
(c) The correlation coefficient using the short computational
formula.
∑ ∑ ∑
√ ∑ (∑ ) √ ∑ (∑ )
( ) ( )( )
√( )( 2 ) ( ) √( )(2 ) ( )

77
This shows that there is a very high (strong) linear correlation
between the variables.

H.W (2)

2 2
( ̅) ( ̅) ( ̅ )2 ( ̅ )( ̅) ( ̅)2
18 17 324 306 289 3 3 9 9 9
12 13 144 156 169 -3 -1 9 3 1
20 19 400 380 361 5 5 25 25 25
15 12 225 180 144 0 -2 0 0 4
10 9 100 90 81 -5 -5 25 25 25
75 70 1193 1112 1044 0 0 68 62 64

̅

̅
(d) Covariance
∑( ̅ )( ̅)

(e) The correlation coefficient using the product moment formula:


∑( ̅ )( ̅ )
√∑( ̅ )2 √∑( ̅ )2 √ √
This shows that there is a very high (strong) linear correlation
between the variables.
(f) The correlation coefficient using the short computational
formula.
∑ ∑ ∑
√ ∑ 2 (∑ )2 √ ∑ 2 (∑ )2
( ) ( )( )
√( )( ) ( )2 √( )( ) ( )2
This shows that there is a very high (strong) linear correlation
between the variables.

77

You might also like