Homework 3

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Valparaiso University

STAT 240
Name: Alexis Camargo
Date: January, 21

Honor Code: “I have neither given or received, nor have I tolerated others’ use of unauthorized aid.”

1.

(a) Plot the team value against revenue. Describe the form, direction, and strength of the
relationship, using the correlation to quantify the strength.

 Form: Linear
 Direction: Positive
 Strength: Strong (Correlation: 0.9625)

(b) Plot the team value against debt. Describe the form, direction, and strength of the relationship,
using the correlation to quantify the strength.
 Form: Linear
 Direction: Positive
 Strength: Weak (Correlation: 0.072344)

(c) Plot the team value against operating income. Describe the form, direction, and strength of the
relationship, using the correlation to quantify the strength.

 Form: Linear
 Direction: Positive
 Strength: Strong (Correlation: 0.8909)

(d) Which variable is the best at explaining the response in team value for the 32 teams in the NFL?
Which variable is the worst?

The best one is the Variable Revenue, and the worst one is Debt
2. Each of the following statements contains a blunder regarding correlation. Explain in each case what
is wrong.

(a) “The correlation between the weight and height of baseball players was found to be r = 0.49 lbs/ft.”

ANS: Correlation doesn’t have units

(b) “We found a high correlation (r = 1.19) between students’ ratings of faculty teaching and ratings
made by other faculty members.”

ANS: Correlation always should be between 0 and 1

(c) “The correlation between the gender of a group of students and the color of their cell phone was r =
0.23.”

ANS: Correlation only can be used for quantitative variables, and gender is not.

3.

(a) Make a scatterplot of Y versus X and superimpose the least-squares regression line and r 2 value.
(b) Add an additional observation to the data set with X = 40 and Y = 44. Make a new scatterplot of Y
versus X and superimpose the least-squares regression line and r 2 value.

(c) Add a different observation to the data set with X = 40 and Y = 30 (do not leave in the additional
observation from part (b)). Make a new scatterplot of Y versus X and superimpose the least-squares
regression line and r 2 value.

(d) Write a short paragraph comparing and contrasting the effects of the two outliers added in parts (b)
and (c) on the regression line and r 2 value.
The presence of outliers in a data set could change significantly our correlation and our regression line
(the slope) and R^2. We can see in these examples (b and c) how the outliers changed our equation and
changed our future predictions. For the (b) we can see that the outlier did not change so much our R^2
and the equation, however, the second outlier (c) it changed significantly our regression equation and
the coefficient of determination.

4. A study among first-year students at a state university showed that in general students who skipped
more classes earned lower grades. Number of classes skipped explained 36% of the variation in grade
index among the students. What is the numerical value of the correlation between number of classes
skipped and grade index?

The correlation between the number of classes skipped and grade index is 0.6

5. The Excel file STAT240 Hwk3 P5 Anscombe.xls contains four sets of data that were prepared by
statistician Frank Anscombe to illustrate the dangers of calculating without first plotting the data. All
four data sets have approximately the same correlation of r = .816 and approximately the same
regression equation of ˆy = 3 + .5x. However, that does not mean that the relationship between X and Y
is approximately the same for each data set!

(a) Make a scatterplot of Y versus X and superimpose the least-squares regression line for each of the
four data sets.
(b) For only one of the data sets is it appropriate to use correlation and regression analyses. Identify
which data set correlation and regression is appropriate for and explain why. Then for each of the other
three data sets, clearly explain why correlation and regression analyses are not appropriate for that
particular data set.

For the only data set that the correlation and regression analyses will be appropriate is Ya Vs Xa because
using the residual plot, we can find that there is not a pattern, so it is appropriate to use the regression
and correlation in this case. For Yd Vs Xd is it not a line (it is a vertical line with a outlier) , for Yb Vs Xb is
a curve and for Yc Vs Xc when we do a residual plot we found that there is a pattern, so its not
appropriate to use the regression and correlation method in that data set.

You might also like