Professional Documents
Culture Documents
Rec 4B - Regression - n-1
Rec 4B - Regression - n-1
For this recitation (and all future recitations), be sure to show all
formulas, calculations, and units, where appropriate.
We are comparing years of education and hours on the internet in the last month, to
see if a relationship exists. If a relationship does exist, we want to predict Internet use
using education level. The output is given below. Assume a scatterplot shows a linear
pattern.
5. Identify what the units of the slope are for the regression line that was calculated
here.
8. Use the line to predict Internet use for someone with 16 years of education.
9. The linear regression of the computer output for the Internet/Education data is
shown below. Use this output to find the equation of the best fitting line and use it
to verify your answer to the equation of the best fitting line earlier in this
assignment.
STAT 1430 Recitation 4B Regression
10. For what education levels is it appropriate to use this line to make predictions? Use
the statistics given in the problem to answer this. Hint: Avoid Extrapolation!
Suppose that the price (in $thousands) and size (in square feet) of a random sample of
houses in Viroqua, Wisconsin was analyzed by a new statistician using Minitab. The
group plans to us the data to help set prices for homes based on their size.
Variable Q3 Maximum
Size 2271 2595
Variable Q3 Maximum
Price 269.8 315.0
Correlation: 0.9041
Regression Analysis
Predictor Coef SE Coef T P-value
Constant -90.88 52.62 -1.73 0.12
Size 0.15556 0.02605 5.97 0.00
11. True or False: Because there is a minus sign on 90.88, we know that the slope of the
best fitting line here is negative.
12. Based on the above output, a scatterplot of this data set that will be used for
prediction purposes would have which variable on which axis?
a. Price on X axis, Size on Y axis
b. Size on X axis, Price on Y axis
c. It doesn’t matter. The regression line won’t change if you switched X and Y
d. “Size” variable on the X axis and “Constant” variable on the Y axis
13. What is the equation of the best-fitting regression line? Find this in two ways. First,
using the regression analysis part of the output, and then using the descriptive
statistics part of the output.
14. Does the Y-intercept have an interpretation here? Why or why not?
STAT 1430 Recitation 4B Regression
15. For each square foot increase in house size, how much does the price increase (in
dollars) on average?
Here are data for calories and salt content (milligrams of sodium) in 17 brands of
meat hot dogs:
16. For what range of sodium can you make a good prediction about calories?
17. A computer found the regression equation for the above problem is
Calories = 61.6 + 0.232 * Sodium (mg). How do we interpret the slope of this line?
a. As the amount of sodium increases by 1 milligram, the calories increase by 0.232
b. As the amount of sodium increases by 1 milligram, the calories increase by 61.6
c. As the amount of calories increases by 1, the sodium increases by 0.232
milligrams
d. As the amount of calories increases by 1, the sodium increases by 61.6
milligrams.
18. There is an influential point here. What are its approximate values for sodium and
calories? Will this point have a large residual? Why or why not?
Suppose the age of a woman is strongly correlated with the age of her husband when
both are marrying for the 2nd time. (Scatterplot shows a linear relationship.)
20. There is a positive linear relationship between these two variables. Explain why this
makes sense in the context of this problem.
21. Suppose you want to use the woman’s age to predict her husband’ age. Before
calculating the slope, do you think it would be >1 or <1? (Think about what slope
means here – change in Y per one unit change in X.) Then find the slope of the
regression line.
22. Suppose you want to use the husband’s age to predict the woman’s age. Before
calculating the slope, will the slope change? If so how? Now find the slope of the
regression line to see if your thoughts are verified.
Let X = quiz 1 score and Y = quiz 2 score. Suppose 5 students have the following
scores, given as quiz 1 score, quiz 2 score (or as X, Y):
Student 1: 10, 10
Student 2: 9, 8
Student 3: 6, 8
Student 4: 5, 9
Student 5: 8, 9
The regression analysis shows R-squared = 58%. The results are below. (Assume the
scatterplot shows a linear relationship.)
Lo
we Up
Stan r per
Coeffic dard P- 95 95
ients Error t Stat value % %
7.6511
Inter 63 1.689 4.528 0.020 2.2 13.
cept 611 357 147 74 028
-
Quiz 0.1511 0.215 0.699 0.534 0.5 0.8
1 62 978 896 383 36 39
25. Find each of the 5 residuals for this data set. (Remember what a residual and how it
is calculated from your lecture notes.) [This is more challenging; try to answer
without your TA’s help – use those around you AND your lecture notes!]
26. Sketch your own residual plot for this data set and interpret. There are 5 points so
there should be 5 residuals.