Professional Documents
Culture Documents
Chapter 3 - Regression
Chapter 3 - Regression
Chapter 3 - Regression
§ Variable:
• This is the variable you want to predict.
• This variable is sometimes called the dependent variable because it depends on the
other variable.
• This will always be known as y.
§ Variable:
• This is the variable we will use to make the prediction.
• This will always be known as x.
Ø Identify the two variables you want to study by determining if there is a relationship between them.
Identify the type of variable: discrete, continuous, or categorical.
Identify which variable would be the response variable and which one is the explanatory variable.
Identify which variable will be labeled x and which variable will be labeled y.
o Example 1: A small business wants to predict the number of items sold per month based on the amount
spent on advertising per month.
o Example 2: The university is going to use the temperature at kickoff to predict the number of hot
chocolates sold during a football game.
Ø Determining if there is a Relationship Between Two Quantitative Variables
o An association exists between two variables if particular values of one variable are more likely to occur
with certain values of the other variable.
o Positive Relationship:
§ As the values of x increase, the values of y tend to .
o Negative Relationship:
§ As the values of x increase, the values of y tend to .
o No Clear Relationship:
§ As the values of x increase, the values of y tend to .
Ø Graphical Way to Determine if there is a Relationship
o Example 2: The university is going to use the temperature at kickoff to predict the number of hot
chocolates sold during a football game.
o In this class we are only going to look for a relationship between two variables.
o There are other types of relationships (patterns) that exists but we will only study linear relationships.
§ Range of vales:
§ Calculation: Use StatCrunch. Stat-> Summary Stats -> Correlation. Select both the explanatory
and the response variable.
§ Determining if relationship is positive or negative:
• If the value of the correlation coefficient is positive then as x increases, y tends to
.
• If the value of the correlation coefficient is negative then as x increase, y tends to
• The closer the value is to the weaker the linear relationship (the further from a
straight line pattern).
• Lets call r > 0.80 a strong linear relationship.
o Calculate the coefficient of correlation and determine if there appears to be a positive, negative, or no
clear relationship between the variables.
§ Use StatCrunch. Stat-> Summary Stats -> Correlation. Select both the explanatory and the
response variable.
o Example: A small business wants to predict the number of items sold per month based on the amount
spent on advertising per month.
o Example 2: The university is going to use the temperature at kickoff to predict the number of hot
chocolates sold during a football game.
• Slope =
• As the value of x increases by 1, the value of y will .
§ What do we expect the scatterplot to look like?
• relationship exists.
• It would be a number.
Ø Finding the line of best fit:
o StatCrunch will be used to find the equation for the line of best fit through the data.
o Method is called Least Squares Regression
§ It is not the line that passes through the most points.
§ It is the line that minimizes the squared difference between the points (the true values) and the
regression line (the predicted values).
o The difference between the true value (y) and the predicted value (𝑦") is called the .
by plugging in a value of the explanatory variable. As long as, the value of x was in the
§ Use StatCrunch to generate the equation for the line of best fit.
§ Use the prediction equation to predict the number of items sold if the amount spent on
advertising was $400.
§ If the true number of items sold when the business spent $400 on advertising was 320, calculate
the residual.
o Example 2: The university is going to use the temperature at kickoff to predict the number of hot
chocolates sold during a football game.
§ Use StatCrunch to generate the equation for the line of best fit.
§ Use the prediction equation to predict the number of hot chocolates sold during a football game
if the temperature at kickoff is 55֯.
Ø R-squared:
o The percent of variability we observe in that is
due to the linear relationship between x and y.
o Denoted 𝑟 !
o Must fall between 0% and 100%.
o Can be found using software or by squaring the correlation coefficient, r.