Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Regression

Regression 2

• If the value of the correlation coefficient is


significant, the next step is to determine the
equation of the regression line which is the data’s
line of best fit.

Bluman Chapter 10 2
Regression 3

• Best fit means that the sum of the squares of the


vertical distance from each point to the line is at a
minimum.

Bluman Chapter 10 3
Regression Line y  a  bx 4

Least squares estimates of A and B:


n xy   x y
b
n x   x 
2 2

a  y  bx
a  y int ercept
b  the slope of the line

Bluman Chapter 10 4
Regression Line y  a  bx 5

Bluman Chapter 10 5
Regression Line y  a  bx 6

Bluman Chapter 10 6
Interpretation of a and b 7

1. Interpretation of a (intercept/constant) :
Since the intercept ALWAYS is the mean of y when x = 0, it will
only be meaningful if it’s meaningful that x = 0 AND if there are
data value x = 0 in the data set.

➢ if x = 0 is in the range (data set), then the interpretation of a is :


If x = 0, y will be a units.

➢ if x = 0 is not in the range (data set), then


a has no practical interpretation.

Bluman, Chapter 10
7
Interpretation of a and b 8

2 . Interpretation of b (slope) :
Slope of the regression line, that is, the number of units of
increase (positive slope) or decrease (negative slope) in y for
each unit increase in x.

➢if b positive, then the interpretation of b is :


If x increases by 1 unit, y will be increase by b units.

➢ if b negative, then the interpretation of b is :


If x increases by 1 unit, y will be decrease by b units.

Bluman, Chapter 10
8
Example 1: Car Rental Companies
9

Construct a scatter plot for the data shown for car rental
companies in the United States for a recent year.

Step 1: Draw and label the x and y axes.


Step 2: Plot each point on the graph.

Bluman Chapter 10 9
Example 1 : Car Rental Companies 10

Find the equation of the regression line for the data in


Example 10–4, and graph the line on the scatter plot.

Σx = 153.8, Σy = 18.7, Σxy = 682.77, Σx2 = 5859.26,


Σy2 = 80.67, n = 6

 0.396

 0.106

y  a  bx  y  0.396  0.106 x
Bluman Chapter 10 10
Example 1 : Car Rental Companies 11

Find two points to sketch the graph of the regression line.

Use any x values between 10 and 60. For example, let x


equal 15 and 40. Substitute in the equation and find the
corresponding y value.

y  0.396  0.106 x y  0.396  0.106 x


 0.396  0.106 15   0.396  0.106  40 
 1.986  4.636

Plot (15,1.986) and (40,4.636), and sketch the resulting line.


Bluman Chapter 10 11
Example 1 : Car Rental Companies 12

Find the equation of the regression line for the data in


Example 10–4, and graph the line on the scatter plot.

y  0.396  0.106 x

 40, 4.636 

15, 1.986 

Bluman Chapter 10 12
Example 1 : Car Rental Companies 13

Use the equation of the regression line to predict the income


of a car rental agency that has 200,000 automobiles.
x = 20 corresponds to 200,000 automobiles.

y  0.396  0.106 x
 0.396  0.106  20 
 2.516

Hence, when a rental agency has 200,000 automobiles, its


revenue will be approximately $2.516 billion.

Bluman Chapter 10 13
Example 2 14

Bluman Chapter 10 14
Example 2 : Absent/Final Grade 15

Bluman Chapter 10 15
Example 2 : Absent/Final Grade 16

Bluman Chapter 10 16
Example 2 : Absent/Final Grade 17

Bluman Chapter 10 17
Assumptions for Valid Predictions 18

1. The sample is a random sample.


2. For any specific value of the independent variable x,
the value of the dependent variable y must be
normally distributed about the regression line.

Bluman Chapter 10 18
Assumptions for Valid Predictions 19

3. The standard deviation of each of the dependent


variables must be the same for each value of the
independent variable.

Bluman Chapter 10 19
Regression 20

•The magnitude of the change in one variable when


the other variable changes exactly 1 unit is called a
marginal change. The value of slope b of the
regression line equation represents the marginal
change.
• For valid predictions, the value of the correlation
coefficient must be significant.
• When r is not significantly different from 0, the
best predictor of y is the mean of the data values
of y.
Bluman Chapter 10 20
Coefficient of Determination, r 2
○ The coefficient of determination is the portion of the
total variation in the dependent variable that is explained
by variation in the independent variable
(ratio of the explained variation to the total variation)

○ The coefficient of determination is also called R-squared


and is denoted as r2

where 0  r2  1

*to get the value for r 2 is to square the correlation


coefficient, r.
21
22

Interpretation of r2 :

(r2 x 100) % variation in y, that is explained by x .

Bluman Chapter 10 22
Example 2 : Absent/Final Grade 23

r = -0.9442
The correlation coefficient of number of times
absent and final grade is r = -0.9442. The
coefficient of determination is
r2 = (-0.9442)2 = 0.8915
Interpretation : About 89.15% of the variation in
final grades can be explained by the number of
times a student is absent.
Note: The other 10.85% is unexplained and can
be due to sampling error or other variables such
as intelligence, amount of time studied, etc.
Bluman Chapter 10 23

You might also like