Professional Documents
Culture Documents
Dr. Saeed A. Dobbah Alghamdi Department of Statistics Faculty of Sciences King Abdulaziz University
Dr. Saeed A. Dobbah Alghamdi Department of Statistics Faculty of Sciences King Abdulaziz University
Dobbah Alghamdi
Department of Statistics
Faculty of Sciences
Building 90, 2ndFloor, Office 26F41
King Abdulaziz University
http://saalghamdy.kau.edu.sa
موقع المكتب
Main Reference
Elementary Statistics
A Step by Step Approach
By
Allan Bluman
Parts of Chapter 10 & 13
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 1
Introduction
Inferential statistics involves determining whether a relationship between two or more
numerical variables exists.
Examples:
▪ A businessman may want to know whether the volume of sales for a given
month is related to the amount of advertising the firm does that month.
▪ Educators are interested in determining whether the number of hours a student
studies is related to the student’s score on a particular exam.
▪ Medical researchers are interested in questions such as, Is caffeine related to
heart damage? or Is there a relationship between a person’s age and his or her
blood pressure?
▪ A zoologist may want to know whether the birth weight of a certain animal is
related to its life span.
These are only some of the many questions that can be answered by using the
techniques of correlation and regression analysis.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 2
Introduction
The purpose of this chapter is to answer such questions statistically:
1. Are two or more variables related?
2. If so, what is the strength of the relationship?
3. What type or relationship exists?
4. What kind of predictions can be made from the relationship?
In a simple correlation and regression studies, the researcher collects data on two
numerical or quantitative variables to see whether a relationship exists between them.
1. An independent variable (explanatory or predictor variable) is the variable that is
being manipulated by the researcher and used to predict the dependent variable.
2. A dependent variable (outcome or response variable) is the resultant variable.
Example:
A manager may wish to see whether the number of years the salespeople have been
working for the company has anything to do with the amount of sales they make.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 3
Scatter Plot
A scatter plot is a visual way of ordered pairs (x, y) to describe the nature of the
relationship between an independent variable (x) and a dependent variable (y).
Researchers look for various types of patterns in scatter plots.
The simple relationship can be positive (direct) or negative (inverse):
▪ A positive (direct) relationship exists when both variables increase or
decrease at the same time.
▪ A negative (inverse) relationship exists when one variable increases and
the other variable decreases.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 4
Scatter Plot
Example:
The following data are the number of absences and the final grades of seven
randomly selected students from a statistics class.
Number of absences 6 2 15 9 12 5 8
Final grades 82 86 43 74 58 90 78
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 5
Scatter Plot
Example: Absences and final grades
Step 1. Input data in columns Step 2. Select Data Step 3 Step 4 Step 5
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 6
Scatter Plot
Example: Absences and final grades
Step 6. Select column that contains the data for X variable Step 7. Select column that contains the data for Y variable Step 9
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 7
Scatter Plot
Example: Absences and final grades
Scatter Plot
The plot indicates that there is a negative linear relationship. Thus, as the
absences increased the final grades decreased on average.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 8
Correlation
Correlation is a statistical method used to determine whether a relationship
between variables exists.
Statisticians use a measure called the correlation coefficient to determine the
strength and direction of the linear relationship between two variables.
▪ The symbol for the population correlation coefficient is (rho).
▪ The symbol for the sample correlation coefficient is r.
▪ The correlation coefficient is a unitless measure.
▪ The value of r is between −1 and +1 inclusively. That is, −1≤ r ≤ 1.
▪ If the values of x and y are interchanged, the value of r will be unchanged.
▪ If the values of x and/or y are converted to a different scale, the value of r will
be unchanged.
▪ The value of r is sensitive to outliers and can change dramatically if they are
present in the data.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 9
Correlation Coefficient
▪ If there is a strong positive (direct) linear relationship between the variables,
the value of r will be close to +1.
▪ If there is a strong negative (inverse) linear relationship between the
variables, the value of r will be close to −1.
▪ When there is no linear relationship between the variables or only a weak
relationship, the value of r will be close to 0.
−1 0 +1
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 10
Correlation Coefficient
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 11
Pearson Linear Correlation Coefficient
The Pearson linear correlation coefficient is one of the formulas used to
determine the strength and direction of a linear relationship between two
numerical variables (x, y) is:
n ( xy ) − ( x )( y )
rp =
( x ) − ( x ) ] [n ( y ) − ( y ) ]
2 2 2 2
[n
where n is the number of data pairs(sample size).
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 12
Pearson Linear Correlation Coefficient
Example:
Compute the value of the correlation coefficient between the number of absences
and the final grades.
Number of absences 6 2 15 9 12 5 8
Final grades 82 86 43 74 58 90 78
Step 1. Input data in columns Step 2. Select Data Step 3 Step 4 Step 5
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 13
Pearson Linear Correlation Coefficient
Example: Absences and the final grades.
Step 6. Select the data in the two columns together Step 7: Select
rs = 1 −
n( n − 1) 2
where d is the difference in the ranks of the two ordinal variables and n is the
number of data pairs or sample size.
Note: Spearman rank correlation coefficient can be used with numerical variables
but Pearson linear correlation coefficient will be better in this case since it uses
the actual values instead of using their ranks.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 15
Spearman Rank Correlation Coefficient
Example:
Seven new hybrid automobiles were rated by a consumer group and an
independent testing lab. The scale consisted of 1 to 20 points. Is there a
relationship between the ratings?
Automobile A B C D E F G
Consumer Group 6 15 20 9 17 12 8
Testing Lab 9 13 18 2 19 11 7
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 16
Spearman Rank Correlation Coefficient
Example: Hybrid automobile rating
Step 1. Input data in columns Step 2. Select “Data” tab Step 3: Select Step 4: Select Step 5: Select
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 17
Spearman Rank Correlation Coefficient
Example: Hybrid automobile rating
Step 6: Input data in columns Step 7: Select
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 19
Spearman Rank Correlation Coefficient
Example: Absences and the final grades.
Step 6. Select the data in the two columns together Step 7: Select
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 22
The Regression Line Equation
▪ The regression line equation can be used to predict a value for the dependent
variable (y) for a given value of the independent variable (x).
▪ The equation of the regression line is written as y ' = a + bx where
a is the y' intercept which is the predicted y value when x is zero,
b is the slope of the line which is the change in the predicted y value when a
change in x value occurred.
▪ Formulas for determining the regression line equation y ' = a + bx
a=
( )
( y ) x 2 − ( x )( xy )
n ( x 2 ) − ( x )2
n ( xy ) − ( x )( y )
b=
( )
n x 2 − ( x )2
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 23
The Regression Line Equation
Example: Absences and the final grades.
Find the expected grade for a student who has been absent for 10 lectures.
Number of absences 6 2 15 9 12 5 8
Final grades 82 86 43 74 58 90 78
Step 1. Input data in columns Step 2. Select Data Step 3 Step 4 Step 5
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 24
The Regression Line Equation
Example: Absences and the final grades.
Step 6. Select column that contains the data for X variable Step 7. Select column that contains the data for Y variable
Step 10
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 25
The Regression Line Equation
Example: Absences and the final grades.
The value of the Pearson linear correlation coefficient
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 26
Summary
✓ One way to determine whether a relationship between variables exists is to use the
statistical techniques known as correlation and regression.
✓ The strength and direction of the relationship are measured by the value of the correlation
coefficient that assumes values between and including −1 and +1 .
✓ The closer the value of the correlation coefficient is to −1 or +1, the stronger the linear
relationship is between the variables.
✓ A value of −1 or +1 indicates a perfect linear relationship.
✓ To determine the shape of a relationship, one draws a scatter plot of the variables. If the
relationship is linear, the data can be approximated by a straight line, called the regression
line, or the line of best fit.
✓ The closer the value of the correlation coefficient is to −1 or +1, the closer the points will
fit the line.
✓ The sign of the slope of the regression line indicates the direction of the relationship.
Positive slope value means positive relationship and negative slope value means negative
relationship.
2023 © All rights are preserved for Dr. Saeed A. Dobbah Alghamdi, Department of Statistics, Faculty of Sciences, KAU 27