Professional Documents
Culture Documents
Chapter 11 - Regression and Correlation Analysis-2
Chapter 11 - Regression and Correlation Analysis-2
1
Simple linear regression and correlation
• Two variables are measured simultaneously.
• The relationship between the two variables is explored/investigated.
Let x and y be the two variables with 𝑛 sample size, the data of these variables is as
follows:
x 𝑥 𝑥 𝑥 … … … 𝑥
y 𝑦 𝑦 𝑦 … … … 𝑦
Example: Scatterplot: The table below shows the number of absences (x), in a Calculus
course and the final exam grade (y), for 7 students. Plot the scatterplot for the data:
Number of absences (x) 1 0 2 6 4 3 3
Exam grades (y) 95 90 90 55 70 80 85
Comment: There is a negative linear relationship between the number of absences and
the final exam grade.
4
2
Example scatterplot: The time x in years that an employee spent at a company and
the employee’s hourly pay, y, for 9 employees are listed in the table below.
Hourly pay (y) Time (x) Scatterplot showing the relationship between
time spent in a company and hourly pay.
25 5
20 3
21 4
42 10
38 15
15 2
44 13
40 12
27 7
𝑦 = 𝑎 + 𝑏𝑥
where:
1
𝑆 ∑ 𝑥 𝑦 − (∑ 𝑥 )(∑ 𝑦)
𝑏= = 𝑛 − 𝒔𝒍𝒐𝒑𝒆
𝑆 1
∑ 𝑥 − ∑ 𝑥
𝑛
𝑎 = 𝑦 − 𝑏𝑥̅ − 𝒊𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕
3
Example: Regression equation
The table below shows the number of absences (x), in a Calculus course and the
final exam grade (y), for 7 students. Construct the equation of the regression line:
𝐱 𝐲 𝐱𝟐 𝐲𝟐 𝐱𝐲
1 95 1 9025 95
0 90 0 8100 0
2 90 4 8100 180
6 55 36 3025 330
4 70 16 4900 280
3 80 9 6400 240
3 85 9 7225 255
1 1
∑ 𝑥 𝑦 − (∑ 𝑥 )(∑ 𝑦) 1380 − (19)(565)
𝑏= 𝑛 = 7 = −6.5549
1 1
∑ 𝑥 − ∑ 𝑥 75 − 19
𝑛 7
∑ 𝑥 19 ∑ 𝑦 565
𝑥̅ = = = 2.7143; 𝑦= = = 80.7143
𝑛 7 𝑛 7
𝑦 = 𝑎 + 𝑏𝑥 = 98.5061 − 6.5549𝑥
Estimate the final exam grade if the student is absent for seven days?
𝑦| = 98.5061 − 6.5549𝑥 = 98.5061 − 6.5549 7 = 52.6218
4
Plotting the regression line on the scatterplot:
Scatterplot showing the relationship between
number of absences and exam grades
120
100
A
Exam grade
80
B
60
40
20
0
0 1 2 3 4 5 6 7
Number of absences
Correlation
It is used to measure the strength of the relationship (r) of two variables x and y.
10
5
Correlation coefficient guideline for interpretation:
11
𝑆
𝑟= where − 1 ≤ 𝑟 ≤ +1
𝑆 𝑆
Example: Correlation coefficient: The table below shows the number of absences (x), in a
Calculus course and the final exam grade (y), for 7 students. Calculate the correlation coefficient
of the data.
1
𝑆 ∑ 𝑥 𝑦 − 𝑛 (∑ 𝑥 )(∑ 𝑦)
𝑟= = = −0.9270
𝑆 𝑆 1 1
∑ 𝑥 −𝑛 ∑ 𝑥 ∑ 𝑦 −𝑛 ∑ 𝑦
Comment: There is a strong negative relationship between the number of absences (x) and the
12
6
Calculating coefficient of determination (𝑟 ):
1
∑ 𝑥 𝑦 − 𝑛 (∑ 𝑥 )(∑ 𝑦)
𝑟 =
1 1
∑ 𝑥 −𝑛 ∑ 𝑥 ∑ 𝑦 −𝑛 ∑ 𝑦
14
7
Step 4: Critical values (t-distribution)
𝑡 ;
and 𝑡 ;
= −𝑡 ;
If 𝑡 ≤ −𝑡 ;
or 𝑡 ≥ 𝑡 ;
reject 𝐻 , significant correlation
If −𝑡 ;
<𝑡<𝑡 ;
do not reject 𝐻 , no significant correlation
15
16
8
Step 3: Test statistics
𝑟 −0.9270
𝑡= = = −5.53
1−𝑟 1 − (−0.9270)
𝑛−2 7−2
17
Exercise
A data analyst working for a transportation company has been tasked with
studying the relationship between the number of hours a delivery truck is on the
road and the total number of deliveries it makes in a day. Data is collected from 8
different days:
Hours on Road Number of Deliveries
(X) (Y)
5 25
9 35
8 32
7 30
4 22
10 38
3 20
6 27
18
9
1.1 Plot a scatterplot between the number of hours on the road and the number of deliveries.
1.2 Calculate the regression equation using the least squares method to model the relationship
between hours on the road and the number of deliveries. What can you conclude about the
slope?
1.3 Estimate the number of deliveries for a delivery truck that is on the road for twelve hours.
1.4 Calculate and interpret the correlation coefficient to determine the strength and the
direction of the relationship between these two variables.
1.5 Calculate and interpret the coefficient of determination between these two variables.
1.6 Perform a hypothesis test to determine whether the correlation coefficient between hours
on the road and the number of deliveries is statistically significant. Use a significance level
of 0.05.
19
10