Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Dr. Saeed A.

Dobbah Alghamdi

Department of Statistics
Faculty of Sciences
Building 90, 2ndFloor, Office 26F41
King Abdulaziz University
http://saalghamdy.kau.edu.sa

‫موقع المكتب‬
Main Reference

Elementary Statistics
A Step by Step Approach
By
Allan Bluman
Parts of Chapter 10 & 13

Correlation and Regression


Objectives
➢ Draw and interpret a scatter plot for a set of ordered pairs.
➢ Compute and interpret the correlation coefficient using Pearson linear
correlation coefficient.
➢ Compute and interpret the correlation coefficient using Spearman rank
correlation coefficient.
➢ Compute and interpret the equation of the regression line.

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 1
Introduction
Inferential statistics involves determining whether a relationship between two or more
numerical variables exists.
Examples:
▪ A businessman may want to know whether the volume of sales for a given
month is related to the amount of advertising the firm does that month.
▪ Educators are interested in determining whether the number of hours a student
studies is related to the student’s score on a particular exam.
▪ Medical researchers are interested in questions such as, Is caffeine related to
heart damage? or Is there a relationship between a person’s age and his or her
blood pressure?
▪ A zoologist may want to know whether the birth weight of a certain animal is
related to its life span.
These are only some of the many questions that can be answered by using the
techniques of correlation and regression analysis.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 2
Introduction
The purpose of this chapter is to answer such questions statistically:
1. Are two or more variables related?
2. If so, what is the strength of the relationship?
3. What type or relationship exists?
4. What kind of predictions can be made from the relationship?
In a simple correlation and regression studies, the researcher collects data on two
numerical or quantitative variables to see whether a relationship exists between them.
1. An independent variable (explanatory or predictor variable) is the variable that is
being manipulated by the researcher and used to predict the dependent variable.
2. A dependent variable (outcome or response variable) is the resultant variable.
Example:
A manager may wish to see whether the number of years the salespeople have been
working for the company has anything to do with the amount of sales they make.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 3
Scatter Plot
A scatter plot is a visual way of ordered pairs (x, y) to describe the nature of the
relationship between an independent variable (x) and a dependent variable (y).
Researchers look for various types of patterns in scatter plots.
The simple relationship can be positive (direct) or negative (inverse):
▪ A positive (direct) relationship exists when both variables increase or
decrease at the same time.
▪ A negative (inverse) relationship exists when one variable increases and
the other variable decreases.

Positive Linear Relationship Negative Linear Relationship No Linear Relationship

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 4
Scatter Plot
Example:
The following data are the number of absences and the final grades of seven
randomly selected students from a statistics class.

Number of absences 6 2 15 9 12 5 8
Final grades 82 86 43 74 58 90 78

Draw the scatter plot for the variables.

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 5
Scatter Plot
Example: Absences and final grades
Step 1. Input data in columns Step 2. Select Data Step 3 Step 4 Step 5

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 6
Scatter Plot
Example: Absences and final grades
Step 6. Select column that contains the data for X variable Step 7. Select column that contains the data for Y variable Step 9

Step 8. Uncheck if you don’t want the fitted regression line

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 7
Scatter Plot
Example: Absences and final grades

Scatter Plot

The plot indicates that there is a negative linear relationship. Thus, as the
absences increased the final grades decreased on average.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 8
Correlation
Correlation is a statistical method used to determine whether a relationship
between variables exists.
Statisticians use a measure called the correlation coefficient to determine the
strength and direction of the linear relationship between two variables.
▪ The symbol for the population correlation coefficient is  (rho).
▪ The symbol for the sample correlation coefficient is r.
▪ The correlation coefficient is a unitless measure.
▪ The value of r is between −1 and +1 inclusively. That is, −1≤ r ≤ 1.
▪ If the values of x and y are interchanged, the value of r will be unchanged.
▪ If the values of x and/or y are converted to a different scale, the value of r will
be unchanged.
▪ The value of r is sensitive to outliers and can change dramatically if they are
present in the data.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 9
Correlation Coefficient
▪ If there is a strong positive (direct) linear relationship between the variables,
the value of r will be close to +1.
▪ If there is a strong negative (inverse) linear relationship between the
variables, the value of r will be close to −1.
▪ When there is no linear relationship between the variables or only a weak
relationship, the value of r will be close to 0.

−1 0 +1

Strong inverse No linear Strong direct


linear relationship relationship linear relationship

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 10
Correlation Coefficient

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 11
Pearson Linear Correlation Coefficient
The Pearson linear correlation coefficient is one of the formulas used to
determine the strength and direction of a linear relationship between two
numerical variables (x, y) is:

n (  xy ) − (  x )(  y )
rp =
(  x ) − (  x ) ] [n (  y ) − (  y ) ]
2 2 2 2
[n
where n is the number of data pairs(sample size).

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 12
Pearson Linear Correlation Coefficient
Example:
Compute the value of the correlation coefficient between the number of absences
and the final grades.
Number of absences 6 2 15 9 12 5 8
Final grades 82 86 43 74 58 90 78
Step 1. Input data in columns Step 2. Select Data Step 3 Step 4 Step 5

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 13
Pearson Linear Correlation Coefficient
Example: Absences and the final grades.
Step 6. Select the data in the two columns together Step 7: Select

The value of the Pearson linear correlation coefficient

The value indicates that there is a


strong negative linear relationship.
Thus, the more absence a student has
the lower is his final grade on average.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 14
Spearman Rank Correlation Coefficient
The Spearman rank correlation coefficient is one of the formulas used to
determine the strength and direction of a linear relationship between two ordinal
variables (x, y) is:
6 d 2

rs = 1 −
n( n − 1) 2

where d is the difference in the ranks of the two ordinal variables and n is the
number of data pairs or sample size.
Note: Spearman rank correlation coefficient can be used with numerical variables
but Pearson linear correlation coefficient will be better in this case since it uses
the actual values instead of using their ranks.

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 15
Spearman Rank Correlation Coefficient
Example:
Seven new hybrid automobiles were rated by a consumer group and an
independent testing lab. The scale consisted of 1 to 20 points. Is there a
relationship between the ratings?

Automobile A B C D E F G
Consumer Group 6 15 20 9 17 12 8
Testing Lab 9 13 18 2 19 11 7

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 16
Spearman Rank Correlation Coefficient
Example: Hybrid automobile rating
Step 1. Input data in columns Step 2. Select “Data” tab Step 3: Select Step 4: Select Step 5: Select

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 17
Spearman Rank Correlation Coefficient
Example: Hybrid automobile rating
Step 6: Input data in columns Step 7: Select

The value of the Spearman rank correlation coefficient

The value indicates that there is a


strong positive linear relationship.
Thus, the ranking of hybrid
automobile of consumer and
testing lab has the same direction.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 18
Spearman Rank Correlation Coefficient
Example:
Compute the value of the Spearman rank correlation coefficient between the
number of absences and the final grades.
Number of absences 6 2 15 9 12 5 8
Final grades 82 86 43 74 58 90 78
Step 1. Input data in columns Step 2. Select Data Step 3 Step 4 Step 5

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 19
Spearman Rank Correlation Coefficient
Example: Absences and the final grades.
Step 6. Select the data in the two columns together Step 7: Select

The value of the Spearman rank linear correlation coefficient

The value indicates that there is a


strong negative linear relationship.
Thus, the more absence a student has
the lower is his final grade on average.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 20
Regression Line
Regression is a statistical method used to describe the nature of the relationship
between variables.
▪ In studying relationships between two variables, collect the data and then construct
a scatter plot.
▪ The purpose of the scatter plot, as indicated previously, is to determine the nature of
the relationship between the variables.
▪ The possibilities include a positive linear relationship, a negative linear relationship,
a curvilinear relationship, or no discernible relationship.
▪ After the scatter plot is drawn and a linear relationship is determined, the next steps
are to compute the value of the correlation coefficient and to test the significance of
the relationship.
▪ If the value of the correlation coefficient is significant (will not be discussed here),
the next step is to determine the equation of the regression line which is the data’s
line of best fit.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 21
Regression Line
▪ Best fit means that the sum of the squares of the vertical distance from each
point to the line is at a minimum.

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 22
The Regression Line Equation
▪ The regression line equation can be used to predict a value for the dependent
variable (y) for a given value of the independent variable (x).
▪ The equation of the regression line is written as y ' = a + bx where
a is the y' intercept which is the predicted y value when x is zero,
b is the slope of the line which is the change in the predicted y value when a
change in x value occurred.
▪ Formulas for determining the regression line equation y ' = a + bx

a=
( )
(  y )  x 2 − (  x )(  xy )
n (  x 2 ) − (  x )2

n (  xy ) − (  x )(  y )
b=
( )
n  x 2 − (  x )2
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 23
The Regression Line Equation
Example: Absences and the final grades.
Find the expected grade for a student who has been absent for 10 lectures.
Number of absences 6 2 15 9 12 5 8
Final grades 82 86 43 74 58 90 78
Step 1. Input data in columns Step 2. Select Data Step 3 Step 4 Step 5

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 24
The Regression Line Equation
Example: Absences and the final grades.
Step 6. Select column that contains the data for X variable Step 7. Select column that contains the data for Y variable

Step 10

Step 8. Change the drop-down menu to “Type in predictor values”

Step 9. Type in the value

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 25
The Regression Line Equation
Example: Absences and the final grades.
The value of the Pearson linear correlation coefficient

The value of the regression intercept “a”


The equation of the
fitted regression line is
y  = 102.493 − 3.622 x
The value of the regression slope “b” Hence, a student who
has been absent for 10
The predicated value when x=10 lectures, we expect his
final grade to be 66 on
average in his statistics
class.

2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 26
Summary
✓ One way to determine whether a relationship between variables exists is to use the
statistical techniques known as correlation and regression.
✓ The strength and direction of the relationship are measured by the value of the correlation
coefficient that assumes values between and including −1 and +1 .
✓ The closer the value of the correlation coefficient is to −1 or +1, the stronger the linear
relationship is between the variables.
✓ A value of −1 or +1 indicates a perfect linear relationship.
✓ To determine the shape of a relationship, one draws a scatter plot of the variables. If the
relationship is linear, the data can be approximated by a straight line, called the regression
line, or the line of best fit.
✓ The closer the value of the correlation coefficient is to −1 or +1, the closer the points will
fit the line.
✓ The sign of the slope of the regression line indicates the direction of the relationship.
Positive slope value means positive relationship and negative slope value means negative
relationship.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 27

You might also like